Open in app

Sign In

Write

Sign In

Richard Pelgrim
Richard Pelgrim

222 Followers

Home

About

Published in Towards Data Science

·Pinned

The Beginner’s Guide to Distributed Computing

7 Fundamental Concepts to Succeed With Distributed Computing in Python — Enter the Distributed Universe More and more data scientists are venturing into the world of distributed computing to scale up their computations and process larger datasets faster. But starting your distributed computing journey can feel a bit like entering an alternate universe: overwhelming, intimidating and confusing. But here’s the good news: you don’t need…

Data Science

12 min read

The Beginner’s Guide to Distributed Computing
The Beginner’s Guide to Distributed Computing
Data Science

12 min read


Jun 1, 2022

Not Everyone Can Become a Data Scientist

Why we need to talk more openly about privilege — This one’s going to be short and to the point, because I’m actually quite upset. I’m upset about privilege. Or, to be more precise, I’m upset about how little we talk about privilege in the data industry. A lot of us act like the tech industry is this golden land…

Data Science

3 min read

Not Everyone Can Become a Data Scientist
Not Everyone Can Become a Data Scientist
Data Science

3 min read


Published in Towards Data Science

·May 17, 2022

Accessing the NYC Taxi Data in 2022

Everything you need to know about the recent changes — As of May 13, 2022, access to the NYC Taxi data has changed. Parquet has now become the new default file format, instead of CSV. Practically, this means you will need to change two things in your code: Change the path to the S3 bucket Use the dd.read_parquet() method instead…

Python

5 min read

Accessing the NYC Taxi Data in 2022
Accessing the NYC Taxi Data in 2022
Python

5 min read


Apr 1, 2022

Julia vs Python for Data Science in 2022

Comparing Programming Languages for Data Science — This article compares Julia to Python in terms of general performance, package availability and adoption and gives guidance on whether you should consider learning it. Know Your Programming Languages for Data Science In 2021 Python achieved #1 ranking in the TIOBE Index of programming languages for the second year in a row. This should come as no…

Data

5 min read

Julia vs Python for Data Science in 2022
Julia vs Python for Data Science in 2022
Data

5 min read


Published in Towards Data Science

·Feb 10, 2022

5 Rookie Mistakes to Avoid when Using Dask

Strategies for Successful Distributed Computing in Python — Using Dask for the first time can be a steep learning curve. This post presents the 5 most common mistakes people make when using Dask — and strategies for how you can avoid making them. Let’s jump in. 1. “Dask is basically pandas, right?” The single-most important thing to do before starting to build things with…

Python

7 min read

5 Rookie Mistakes to Avoid when Using Dask
5 Rookie Mistakes to Avoid when Using Dask
Python

7 min read


Published in Towards Data Science

·Jan 7, 2022

How to Build Powerful Airflow DAGs for Big Data Workflows in Python

Scale your Airflow pipelines to the cloud — Airflow DAGs for (Really!) Big Data Apache Airflow is one of the most popular tools for orchestrating data engineering, machine learning, and DevOps workflows. But it has one important drawback. Out-of-the-box, Airflow will run your computations locally, which means you can only process datasets that fit within the resources of your machine. To use Airflow for…

Data Science

5 min read

How to Build Powerful Airflow DAGs for Big Data Workflows in Python
How to Build Powerful Airflow DAGs for Big Data Workflows in Python
Data Science

5 min read


Published in Towards Data Science

·Jan 5, 2022

Why You Should Save NumPy Arrays with Zarr

Read and Write Arrays Faster with Dask — tl;dr This post tells you why and how to use the Zarr format to save your NumPy arrays. It walks you through the code to read and write large NumPy arrays in parallel using Zarr and Dask. Here’s the code if you want to jump right in. If you have questions…

Data Science

5 min read

Why You Should Save NumPy Arrays with Zarr
Why You Should Save NumPy Arrays with Zarr
Data Science

5 min read


Published in Towards Data Science

·Dec 25, 2021

How to Write NumPy Arrays to CSV Files

And why you should consider other file formats — This post explains how to write NumPy arrays to CSV files. We will look at: the syntax for writing different NumPy arrays to CSV the limitations of writing NumPy arrays to CSV alternative ways to save NumPy arrays Let’s get to it. Writing NumPy Arrays to CSV You can use the np.savetxt() method to save…

Data Science

4 min read

How to Write NumPy Arrays to CSV Files
How to Write NumPy Arrays to CSV Files
Data Science

4 min read


Published in Towards Data Science

·Dec 20, 2021

Parallel XGBoost with Dask in Python

Machine Learning that Scales to Very Large Datasets — tl;dr Out of the box, XGBoost cannot be trained on datasets larger than your computer memory; Python will throw a MemoryError. This tutorial will show you how to go beyond your local machine limitations by leveraging distributed XGBoost with Dask with only minor changes to your existing code. Specifically, you will…

Data

6 min read

Parallel XGBoost with Dask in Python
Parallel XGBoost with Dask in Python
Data

6 min read


Published in Python in Plain English

·Dec 16, 2021

3 Things I Did to Become a Data Science Evangelist in One Year

Essential soft skills they don’t teach you in tech bootcamps. — It’s kind of mind-boggling to think that I wrote my first-ever Python code just over a year ago! Especially since these days, I work at Coiled.io as a Data Science Evangelist, writing about all things PyData for a living. Below I’ll share the 3 things I did that helped me…

Data Science

4 min read

3 Things I Did to Become a Data Science Evangelist in One Year
3 Things I Did to Become a Data Science Evangelist in One Year
Data Science

4 min read

Richard Pelgrim

Richard Pelgrim

222 Followers

Mindful techie crunching data at scale | Connect: https://www.linkedin.com/in/richard-pelgrim/ | Unlimited Reads: https://richardpelgrim.medium.com/membership

Following
  • Russell Jurney

    Russell Jurney

  • ODSC - Open Data Science

    ODSC - Open Data Science

  • Jude Ellison S. Doyle

    Jude Ellison S. Doyle

  • Bex T.

    Bex T.

  • Salvatore Raieli

    Salvatore Raieli

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech