Posted in 2024

Faster Xarray Quantile Computations with Dask

Dec 17, 2024

Read more ...


Improving GroupBy.map with Dask and Xarray

Nov 21, 2024

Read more ...


SLURM-Style Job Arrays on the Cloud with Coiled

Nov 19, 2024

Read more ...


Airflow, Dask, & Coiled: Adding Big Data Processing to Your Cloud Toolkit

Nov 12, 2024

Siemens logo

Read more ...


Large Scale Geospatial Benchmarks: First Pass

We implement several large-scale geo benchmarks. Most break. Fun!

../_images/rechunking-diagram.png

Read more ...


Scaling AI-Based Data Processing with Hugging Face + Dask

Oct 9, 2024

../_images/dask-hf.png

Read more ...


Large Scale Geospatial Benchmarks

Sep 9, 2024

../_images/tpch-ab-tests.png

Read more ...


DataFrames at Scale Comparison: TPC-H

May 14, 2024

Read more ...


Dask DataFrame is Fast Now

May 14, 2024

Read more ...


Dask vs. Spark

Apr 19, 2024

Bar chart comparing the relative difference in TPC-H query runtime for Dask vs. PySpark when executed on a M1 MacBook Pro with 8 cores. Orange represents queries where Dask is faster and blue where PySpark is faster.

Read more ...


Easy Scalable Production ETL

We show a lightweight scalable data pipeline that runs large Python jobs on a schedule on the cloud.

Scalable data pipeline example that runs regularly scheduled jobs on the cloud.

Read more ...


One Trillion Row Challenge

Feb 5, 2024

Read more ...


Real-world Grocery Demand Forecasting

Jan 31, 2024

Line graph of forecasted sales and actual sales over time.

Read more ...


Schedule Python Jobs with Prefect and Coiled

Jan 23, 2024

Read more ...


One Billion Row Challenge (1BRC) in Python with Dask

Jan 16, 2024

Read more ...