Creating intuitive, interactive web applications to visualize Big Data with Streamlit, Dask, and Coiled


  1. a heavier groupby computation,
  2. interactive widgets to scale the Coiled cluster up or down, and
  3. an option to shut down the cluster on demand.

Using unsupervised NLP topic modelling and clustering to build a machine learning classifier that can identify misinformation in short-text documents

A comparative analysis of two NLP topic modelling approaches for short-text documents, using Arabic Twitter data

How to create delayed objects around functions you only want to run once per worker

Use Case

Pre-processing Arabic text for machine-learning using the camel-tools Python package

  1. The form of characters and spelling of words can vary depending on their context (fancy…

  1. Use distributed processing to work with larger-than-memory datasets hosted in the cloud,
  2. Work with Arabic Natural Language Processing, and
  3. To do that, I’ll have to wrap my brain cells around working with deep-learning networks using Tensorflow.

The Dataset

Data for Change

Building a machine learning model to predict the intensity of conflicts using a century of climate change data

…dots. LOTS of dots.

Richard Pelgrim

Data Scientist & Fermentation Fanatic | Data Science Evangelist Intern @

