Posts by Patrick Hoefler

Dask-expr is an ongoing effort to add a logical query optimization layer to Dask DataFrames. We now have the first benchmark results to share that were run against the current DataFrame implementation.

Read more ...

Coiled observability wins: Chunksize

19 September 2023

Distributed computing is hard, distributed debugging is even harder. Dask tries to simplify this process as much as possible. Coiled adds additional observability features for your Dask clusters and processes them to help users understand their workflows better.

Read more ...

Reduce training time for CPU intensive models with scikit-learn and Coiled Functions

01 September 2023

Sep 1, 2023

Read more ...

Process Hundreds of GB of Data with DuckDB in the Cloud

07 August 2023

Aug 7, 2023

Read more ...

High Level Query Optimization in Dask

04 August 2023

Dask DataFrame doesn’t currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let’s look at a common example which looks ok at first glance, but is actually pretty inefficient.

Read more ...

How to Train a Neural Network on a GPU in the Cloud with coiled functions

24 July 2023

Jul 24, 2023

Read more ...

Dask performance benchmarking put to the test: Fixing a pandas bottleneck

23 June 2023

Jun 23, 2023

Read more ...

Utilizing PyArrow to improve pandas and Dask workflows

05 June 2023

Jun 5, 2023

Read more ...