Posted in 2024

Faster Xarray Quantile Computations with Dask

Patrick Hoefler

Read more ...


Improving GroupBy.map with Dask and Xarray

Patrick Hoefler

Read more ...


SLURM-Style Job Arrays on the Cloud with Coiled

Matthew Rocklin

Read more ...


Airflow, Dask, & Coiled: Adding Big Data Processing to Your Cloud Toolkit

Stephen Schneider, Franco Bosetti

Siemens logo

Read more ...


Large Scale Geospatial Benchmarks: First Pass

We implement several large-scale geo benchmarks. Most break. Fun!

../_images/rechunking-diagram.png

Read more ...


Scaling AI-Based Data Processing with Hugging Face + Dask

Sarah Johnson, James Bourbeau, Quentin Lhoest, Daniel van Strien

../_images/dask-hf.png

Read more ...


Large Scale Geospatial Benchmarks

James Bourbeau, Matt Rocklin

../_images/tpch-ab-tests.png

Read more ...


DataFrames at Scale Comparison: TPC-H

Hendrik Makait, Sarah Johnson, Matthew Rocklin

Read more ...


Dask DataFrame is Fast Now

Patrick Hoefler

Read more ...


Dask vs. Spark

Sarah Johnson, Florian Jetter

Bar chart comparing the relative difference in TPC-H query runtime for Dask vs. PySpark when executed on a M1 MacBook Pro with 8 cores. Orange represents queries where Dask is faster and blue where PySpark is faster.

Read more ...


Easy Scalable Production ETL

We show a lightweight scalable data pipeline that runs large Python jobs on a schedule on the cloud.

Scalable data pipeline example that runs regularly scheduled jobs on the cloud.

Read more ...


One Trillion Row Challenge

Sarah Johnson

Read more ...


Real-world Grocery Demand Forecasting

Jack Solomon

Line graph of forecasted sales and actual sales over time.

Read more ...


Schedule Python Jobs with Prefect and Coiled

James Bourbeau

Read more ...


One Billion Row Challenge (1BRC) in Python with Dask

Sarah Johnson

Read more ...