Posted in 2023

Xarray at Large Scale: A Beginner’s Guide

21 December 2023

Dec 21, 2023

Read more ...

Polars queries in the cloud with Coiled

17 November 2023

Data

Code snippet of using the coiled.function decorator to run a query with Polars on a large VM in the cloud

Read more ...

Processing Terabyte-Scale NASA Cloud Datasets with Coiled

01 November 2023

We show how to run existing NASA data workflows on the cloud, in parallel, with minimal code changes using Coiled. We also discuss cost optimization.

Comparing cost and duration between running the same workflow locally on a laptop, running on AWS, and running with cost optimizations on AWS.

Read more ...

Run Jupyter Notebooks on a GPU on the Cloud

10 October 2023

Oct 10, 2023

Read more ...

Ten Cents Per Terabyte

06 October 2023

Oct 6, 2023

Read more ...

TPC-H Benchmarks for Query Optimization with Dask Expressions

05 October 2023

Dask-expr is an ongoing effort to add a logical query optimization layer to Dask DataFrames. We now have the first benchmark results to share that were run against the current DataFrame implementation.

Read more ...

Coiled observability wins: Chunksize

19 September 2023

Distributed computing is hard, distributed debugging is even harder. Dask tries to simplify this process as much as possible. Coiled adds additional observability features for your Dask clusters and processes them to help users understand their workflows better.

Read more ...

Parallel Serverless Functions at Scale

07 September 2023

Data

Comparing code runtime between a laptop, single cloud VM, and multiple cloud VMs in parallel

Read more ...

Processing a 250 TB dataset with Coiled, Dask, and Xarray

05 September 2023

We processed 250TB of geospatial cloud data in twenty minutes on the cloud with Xarray, Dask, and Coiled. We do this to demonstrate scale and to think about costs.

County-level heat map of the continental US showing mean depth to soil saturation (in meters) in 2020.

Read more ...

Reduce training time for CPU intensive models with scikit-learn and Coiled Functions

01 September 2023

Sep 1, 2023

Read more ...

Fine Performance Metrics and Spans

23 August 2023

While it’s trivial to measure the end-to-end runtime of a Dask workload, the next logical step - breaking down this time to understand if it could be faster - has historically been a much more arduous task that required a lot of intuition and legwork, for novice and expert users alike. We wanted to change that.

Populated Fine Performance Metrics dashboard

Read more ...

Data-proximate Computing with Coiled Functions

10 August 2023

Coiled Functions make it easy to improve performance and reduce costs by moving your computations next to your cloud data.

Read more ...

Dask, Dagster, and Coiled for Production Analysis at OnlineApp

09 August 2023

We show a simple integration between Dagster and Dask+Coiled. We discuss how this made a common problem, processing a large set of files every month, really easy.

Conceptual diagram showing how to use Dagster with Coiled and Dask.

Read more ...

Process Hundreds of GB of Data with DuckDB in the Cloud

07 August 2023

Aug 7, 2023

Read more ...

High Level Query Optimization in Dask

04 August 2023

Dask DataFrame doesn’t currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let’s look at a common example which looks ok at first glance, but is actually pretty inefficient.

Read more ...

Easy Heavyweight Serverless Functions

01 August 2023

What is the easiest way to run Python code in the cloud, especially for compute jobs?

Read more ...

How to Train a Neural Network on a GPU in the Cloud with coiled functions

24 July 2023

Jul 24, 2023

Read more ...

Dask performance benchmarking put to the test: Fixing a pandas bottleneck

23 June 2023

Jun 23, 2023

Read more ...

Coiled notebooks

14 June 2023

We recently pushed out a new, experimental notebooks feature for easily launching Jupyter servers in the cloud from your local machine. We’re excited about Coiled notebooks because they:

Read more ...

Utilizing PyArrow to improve pandas and Dask workflows

05 June 2023

Jun 5, 2023

Read more ...

Distributed printing

18 May 2023

Dask makes it easy to print whether you’re running code locally on your laptop, or remotely on a cluster in the cloud.

Read more ...

Observability for Distributed Computing with Dask

16 May 2023

May 16, 2023

8m read

Read more ...

GIL monitoring in Dask

15 May 2023

May 15, 2023

Read more ...

Performance testing at Coiled

05 May 2023

At Coiled we develop Dask and automatically deploy it to large clusters of cloud workers (sometimes 1000+ EC2 instances at once!). In order to avoid surprises when we publish a new release, Dask needs to be covered by a comprehensive battery of tests — both for functionality and performance.

Read more ...

How well does Dask run on Graviton?

05 May 2023

May 5, 2023

bar chart of AWS cost vs. processor type

Read more ...

Upstream testing in Dask

18 April 2023

Dask has deep integrations with other libraries in the PyData ecosystem like NumPy, pandas, Zarr, PyArrow, and more. Part of providing a good experience for Dask users is making sure that Dask continues to work well with this community of libraries as they push out new releases. This post walks through how Dask maintainers proactively ensure Dask continuously works with its surrounding ecosystem.

Read more ...

Burstable vs non-burstable AWS instance types for data engineering workloads

04 April 2023

Apr 4, 2023

Read more ...

Shuffling large data at constant memory in Dask

15 March 2023

Mar 15, 2023

Read more ...

Just in time Python environments

23 February 2023

Feb 23, 2023

Read more ...

How many PEPs does it take to install a package?

17 January 2023

A few months ago we released package sync, a feature that takes your Python environment and replicates it in the cloud with zero effort.

Read more ...

Scaling Hyperparameter Optimization With XGBoost, Optuna, and Dask

06 January 2023

XGBoost is one of the most well-known libraries among data scientists, having become one of the top choices among Kaggle competitors. It is performant in a wide of array of supervised machine learning problems, implements scalable training through the rabit library, and integrates with many big data processing tools, including Dask.

Read more ...

Handling Unexpected AWS IAM Changes

06 January 2023

The cloud is tricky! You might think the rules that determine which IAM permissions are required for which actions will continue to apply in the same way. You might think they’d apply the same way to different AWS accounts. Or that if these things aren’t true, at least AWS will let you know. (I did.) You’d be wrong!

Read more ...

AWS Cost Explorer Tips and Tricks

06 January 2023

Jan 6, 2023

Read more ...