Posts by James Bourbeau

Large Scale Geospatial Benchmarks: First Pass

We implement several large-scale geo benchmarks. Most break. Fun!

../../_images/rechunking-diagram.png

Read more ...


Scaling AI-Based Data Processing with Hugging Face + Dask

Sarah Johnson, James Bourbeau, Quentin Lhoest, Daniel van Strien

../../_images/dask-hf.png

Read more ...


Large Scale Geospatial Benchmarks

James Bourbeau, Matt Rocklin

../../_images/tpch-ab-tests.png

Read more ...


Easy Scalable Production ETL

We show a lightweight scalable data pipeline that runs large Python jobs on a schedule on the cloud.

Scalable data pipeline example that runs regularly scheduled jobs on the cloud.

Read more ...


Schedule Python Jobs with Prefect and Coiled

James Bourbeau

Read more ...


Processing Terabyte-Scale NASA Cloud Datasets with Coiled

We show how to run existing NASA data workflows on the cloud, in parallel, with minimal code changes using Coiled. We also discuss cost optimization.

Comparing cost and duration between running the same workflow locally on a laptop, running on AWS, and running with cost optimizations on AWS.

Read more ...


Parallel Serverless Functions at Scale

The cloud offers amazing scale, but it can be difficult for Python data developers to use. This post walks through how to use Coiled Functions to run your existing code in parallel on the cloud with minimal code changes.

Comparing code runtime between a laptop, single cloud VM, and multiple cloud VMs in parallel

Read more ...


Data-proximate Computing with Coiled Functions

Coiled Functions make it easy to improve performance and reduce costs by moving your computations next to your cloud data.

../../_images/data-proximate.png

Read more ...


Coiled notebooks

We recently pushed out a new, experimental notebooks feature for easily launching Jupyter servers in the cloud from your local machine. We’re excited about Coiled notebooks because they:

Read more ...


Distributed printing

Dask makes it easy to print whether you’re running code locally on your laptop, or remotely on a cluster in the cloud.

print-in-worker-logs

Read more ...


Upstream testing in Dask

Dask has deep integrations with other libraries in the PyData ecosystem like NumPy, pandas, Zarr, PyArrow, and more. Part of providing a good experience for Dask users is making sure that Dask continues to work well with this community of libraries as they push out new releases. This post walks through how Dask maintainers proactively ensure Dask continuously works with its surrounding ecosystem.

Read more ...