Posts by Patrick Hoefler

Dask DataFrame is Fast Now

Patrick Hoefler

Read more ...


TPC-H Benchmarks for Query Optimization with Dask Expressions

Dask-expr is an ongoing effort to add a logical query optimization layer to Dask DataFrames. We now have the first benchmark results to share that were run against the current DataFrame implementation.

Read more ...


Coiled observability wins: Chunksize

Distributed computing is hard, distributed debugging is even harder. Dask tries to simplify this process as much as possible. Coiled adds additional observability features for your Dask clusters and processes them to help users understand their workflows better.

../../_images/chunksize_task_stream.png

Read more ...


Reduce training time for CPU intensive models with scikit-learn and Coiled Functions

Patrick Hoefler

Read more ...


Process Hundreds of GB of Data with DuckDB in the Cloud

Patrick Hoefler

Read more ...


High Level Query Optimization in Dask

Dask DataFrame doesn’t currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let’s look at a common example which looks ok at first glance, but is actually pretty inefficient.

Read more ...


How to Train a Neural Network on a GPU in the Cloud with coiled functions

Patrick Hoefler

Read more ...


Dask performance benchmarking put to the test: Fixing a pandas bottleneck

Patrick Hoefler, Hendrik Makait

Read more ...


Utilizing PyArrow to improve pandas and Dask workflows

Patrick Hoefler

Read more ...