Posts by Patrick Hoefler
TPC-H Benchmarks for Query Optimization with Dask Expressions
- 05 October 2023
Dask-expr is an ongoing effort to add a logical query optimization layer to Dask DataFrames. We now have the first benchmark results to share that were run against the current DataFrame implementation.
Coiled observability wins: Chunksize
- 19 September 2023
Distributed computing is hard, distributed debugging is even harder. Dask tries to simplify this process as much as possible. Coiled adds additional observability features for your Dask clusters and processes them to help users understand their workflows better.
Reduce training time for CPU intensive models with scikit-learn and Coiled Functions
- 01 September 2023
Patrick Hoefler
High Level Query Optimization in Dask
- 04 August 2023
Dask DataFrame doesn’t currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let’s look at a common example which looks ok at first glance, but is actually pretty inefficient.
How to Train a Neural Network on a GPU in the Cloud with coiled functions
- 24 July 2023
Patrick Hoefler
Dask performance benchmarking put to the test: Fixing a pandas bottleneck
- 23 June 2023
Patrick Hoefler, Hendrik Makait