Posts tagged performance
TPC-H Benchmarks for Query Optimization with Dask Expressions
- 05 October 2023
Dask-expr is an ongoing effort to add a logical query optimization layer to Dask DataFrames. We now have the first benchmark results to share that were run against the current DataFrame implementation.
Fine Performance Metrics and Spans
- 23 August 2023
While it’s trivial to measure the end-to-end runtime of a Dask workload, the next logical step - breaking down this time to understand if it could be faster - has historically been a much more arduous task that required a lot of intuition and legwork, for novice and expert users alike. We wanted to change that.
High Level Query Optimization in Dask
- 04 August 2023
Dask DataFrame doesn’t currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let’s look at a common example which looks ok at first glance, but is actually pretty inefficient.
Dask performance benchmarking put to the test: Fixing a pandas bottleneck
- 23 June 2023
Patrick Hoefler, Hendrik Makait
Performance testing at Coiled
- 05 May 2023
At Coiled we develop Dask and automatically deploy it to large clusters of cloud workers (sometimes 1000+ EC2 instances at once!). In order to avoid surprises when we publish a new release, Dask needs to be covered by a comprehensive battery of tests — both for functionality and performance.