Posts by Matthew Rocklin

DataFrames at Scale Comparison: TPC-H

We run benchmarks derived from the TPC-H benchmark suite on a variety of scales, hardware architectures, and dataframe projects, notably Apache Spark, Dask, DuckDB, and Polars. No project wins.

Read more ...


Ten Cents Per Terabyte

The optimal cost of cloud computing

Back-of-the-envelope calculation for a simple workload bound primarily by network bandwidth. Calculation is 1 TB / (60 MB/s) / 3600 s/hr * $0.02 / hr = $0.10

Read more ...


Easy Heavyweight Serverless Functions

What is the easiest way to run Python code in the cloud, especially for compute jobs?

Read more ...