How to create delayed objects around functions you only want to run once per worker

Photo by Joshua Sortino on Unsplash

tl;dr

is a utility to create dask.delayed objects around functions that you only want to ever run once per distributed worker. This is useful when you have some large data baked into your docker image and need to use that data as auxiliary input to another dask operation (, for example). Rather than transfer the serialised data between workers in the cluster — which will be slow because of the size of the data — allows you to call the parsing function once per worker, then use the same parsed object downstream.

See use case below.

Use Case

For my Arabic-Language…


Pre-processing Arabic text for machine-learning using the camel-tools Python package

image under license to Richard Pelgrim

In this article, I provide a concise and to-the-point overview of the challenges of working with Arabic text in NLP projects…and the tools available to overcome them. I rely heavily on the camel-tools Python package developed at the NYU Abu Dhabi CAMeL Lab and this excellent webinar by its director, Dr. Nizar Habash. Big shout-out to them for doing groundbreaking work in the field and making their tools accessible to the public!

Challenges

Working with Arabic text in NLP projects presents (at least) 5 unique challenges:

  1. The form of characters and spelling of words can vary depending on their context (fancy…


It’s time for my second and final capstone project, with which I’ll be completing my Springboard Data Science Career Track. Cat believe I’m almost at the end of this thing already; these 6 months have paw-n by.

For my final project, I’m setting myself some technical challenges — things I want to learn that go beyond the curriculum. Specifically, I want to:

  1. Use distributed processing to work with larger-than-memory datasets hosted in the cloud,
  2. Work with Arabic Natural Language Processing, and
  3. To do that, I’ll have to wrap my brain cells around working with deep-learning networks using Tensorflow.

The Dataset

Large Arabic…


Data for Change

Building a machine learning model to predict the intensity of conflicts using a century of climate change data

Image via iStock under license to Richard Pelgrim

This story is part of a linked series tracking my progress through my first independent data science project. Find the previous post here and Jupyter Notebooks here.

tl;dr

Climate change is leading to increased political tensions and, some researchers speculate, is therefore driving increased armed conflict across the world. This project attempts to build a machine learning model to predict conflict intensity (measured as number of deaths per day) in India based on available Precipitation and Temperature data from the surrounding area (< 300km). The project concludes that it is not possible to accurately predict conflict intensity using local climate data


This story is part of a linked series tracking my progress through my first independent data science project. Find the previous post here, next post here, and Jupyter Notebooks here.

Last week, I officially hit the 50% mark on my Springboard Data Science Career Track curriculum. That means I’ve put in (at least) 300 hours of work so far.

So…what do you have to show for it, Richard?!

S this where I show off my fancy coding skills and rave on about all of the technical lingo I’ve mastered to prove to you that I really am one bad-ass panda-wrangling…


This story is part of a linked series documenting my progress through my first independent data science project. Find the previous post here and Jupyter Notebooks here.

I’ve got some first results to show! Very early days — I’m still mostly wrangling my datasets into shape — but I’ve got some maps; and as well all know, where there are maps there’s a good chance there will be………………

…dots. LOTS of dots.

Three-hundred-and-forty-thousand-four-hundred-and-sixty-nine dots, to be exact. The blue dots are all the GHCN weather stations; the orange ones are the UCDP conflict incidents. Even just eyeballing this, it’s clear that some geographical…


Yes, it’s true. I wrangle with Pandas. On the daily.

Except my Pandas are purely digital, imported into my digital wrangling environment with a simple line of code. I don’t even break a sweat.

And to make sure I really don’t exert myself too much here, I even chop a six-letter word into a two-letter abbreviation. That’s how lazy (accomplished!) a wrangler I have become in the span of just a few weeks.

But what about all that cute, fuzzy fur, I hear you ask? What’s the point of wrangling pandas if you can’t bury your face into that…


This week I’ll be starting work on my first independent data science project. After quite a few rabbit hole sessions throughout the internet, I’ve finally settled down on a topic: I’ll be exploring the correlations between incidents of armed conflict and measures of climate change.

For some background, here’s a short video describing a Stanford study on the topic, published last year. You can find links to an article about the study and the study itself at the end of this post.

Disclaimer: just want to put out there that I’m neither an expert on climate change nor on…


Last week I attended the Post-human Territories session as part of Amsterdam’s FIBER Festival. The session explored the influence and impact of AI (specifically, machine learning) in environmentalist and humanitarian endeavours.

By far the most inspiring contribution to the evening was that of engineer and artist Tega Brain who shared her project Deep Swamp.

http://www.tegabrain.com/deep-swamp

The installation consists of three semi-submerged micro ‘landscapes’ which are each managed by a single machine learning system which uses image recognition to evaluate its own performance and tweak the landscape that has been placed under its care. Nothing mind-blowing, you might think. …


Just a brief interruption to announce some breaking news:

A mere 17 years after releasing Where is the Love?, Fergie has finally decided to put her money where her mouth is. She’s going to take her own question seriously and has enrolled herself for a Ph.D. in Migration Studies.

Because she knows that there is really only one hump worth talking about:

The migration hump, of course.

The ‘migration hump’ is a well known and very influential theory describing the relationship between economic development and migration. It is based on research that indicates that when poor countries get richer, migration from those countries increases. This…

Richard Pelgrim

Data Scientist & Communicator | M.Sc. Human Geography & Planning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store