Posted in 2025
17 March 2025 - Reducing Memory Pressure for Xarray + Dask Workloads
02 January 2025 - Coiled 2024 in Review
Posted in 2024
17 December 2024 - Faster Xarray Quantile Computations with Dask
21 November 2024 - Improving GroupBy.map with Dask and Xarray
19 November 2024 - SLURM-Style Job Arrays on the Cloud with Coiled
12 November 2024 - Airflow, Dask, & Coiled: Adding Big Data Processing to Your Cloud Toolkit
16 October 2024 - Large Scale Geospatial Benchmarks: First Pass
09 October 2024 - Scaling AI-Based Data Processing with Hugging Face + Dask
09 September 2024 - Large Scale Geospatial Benchmarks
14 May 2024 - DataFrames at Scale Comparison: TPC-H
14 May 2024 - Dask DataFrame is Fast Now
19 April 2024 - Dask vs. Spark
08 April 2024 - Easy Scalable Production ETL
05 February 2024 - One Trillion Row Challenge
31 January 2024 - Real-world Grocery Demand Forecasting
23 January 2024 - Schedule Python Jobs with Prefect and Coiled
16 January 2024 - One Billion Row Challenge (1BRC) in Python with Dask
Posted in 2023
21 December 2023 - Xarray at Large Scale: A Beginner’s Guide
17 November 2023 - Process Hundreds of GB of Data in the Cloud with Polars
01 November 2023 - Processing Terabyte-Scale NASA Cloud Datasets with Coiled
10 October 2023 - Run Jupyter Notebooks on a GPU on the Cloud
06 October 2023 - Ten Cents Per Terabyte
05 October 2023 - TPC-H Benchmarks for Query Optimization with Dask Expressions
19 September 2023 - Coiled observability wins: Chunksize
07 September 2023 - Parallel Serverless Functions at Scale
05 September 2023 - Processing a 250 TB dataset with Coiled, Dask, and Xarray
01 September 2023 - Reduce training time for CPU intensive models with scikit-learn and Coiled Functions
23 August 2023 - Fine Performance Metrics and Spans
10 August 2023 - Data-proximate Computing with Coiled Functions
09 August 2023 - Dask, Dagster, and Coiled for Production Analysis at OnlineApp
07 August 2023 - Process Hundreds of GB of Data with DuckDB in the Cloud
04 August 2023 - High Level Query Optimization in Dask
01 August 2023 - Easy Heavyweight Serverless Functions
14 June 2023 - Coiled notebooks
05 June 2023 - Utilizing PyArrow to improve pandas and Dask workflows
18 May 2023 - Distributed printing
16 May 2023 - Observability for Distributed Computing with Dask
15 May 2023 - GIL monitoring in Dask
05 May 2023 - Performance testing at Coiled
05 May 2023 - How well does Dask run on Graviton?
18 April 2023 - Upstream testing in Dask
15 March 2023 - Shuffling large data at constant memory in Dask
23 February 2023 - Just in time Python environments
17 January 2023 - How many PEPs does it take to install a package?
06 January 2023 - Scaling Hyperparameter Optimization With XGBoost, Optuna, and Dask
06 January 2023 - Handling Unexpected AWS IAM Changes
06 January 2023 - AWS Cost Explorer Tips and Tricks
Posted in 2022
19 December 2022 - Automated Data Pipelines On Dask With Coiled & Prefect
09 February 2022 - Reading CSV files into Dask DataFrames with read_csv
01 January 2022 - Writing Parquet Files with Dask using to_parquet
01 January 2022 - Why we passed on Kubernetes
01 January 2022 - Understanding Managed Dask (Dask as a Service)
01 January 2022 - Tackling unmanaged memory with Dask
01 January 2022 - Speed up a pandas query 10x with these 6 Dask DataFrame tricks
01 January 2022 - Spark to Dask: The Good, Bad, and Ugly of Moving from Spark to Dask
01 January 2022 - Snowflake and Dask: a Python Connector for Faster Data Transfer
01 January 2022 - Seven Stages of Open Software
01 January 2022 - Setting a Dask DataFrame index
01 January 2022 - Search at Grubhub and User Intent
01 January 2022 - Scale your data science workflows with Python and Dask
01 January 2022 - Save Money with Spot
01 January 2022 - Repartitioning Dask DataFrames
01 January 2022 - Reducing memory usage in Dask workloads by 80%
01 January 2022 - Reduce memory usage with Dask dtypes
01 January 2022 - PyArrow Strings in Dask DataFrames
01 January 2022 - Prioritizing Pragmatic Performance for Dask
01 January 2022 - Perform a Spatial Join in Python
01 January 2022 - Introducing the Dask Active Memory Manager
01 January 2022 - How to Merge Dask DataFrames
01 January 2022 - How to Convert a pandas Dataframe into a Dask Dataframe
01 January 2022 - How Coiled sets memory limit for Dask workers
01 January 2022 - Filtering Dask DataFrames with loc
01 January 2022 - Enterprise Dask Support
01 January 2022 - Easily Run Python Functions in Parallel
01 January 2022 - Double River: Enhanced Algorithmic Trading Performance
01 January 2022 - Dask on GCP
01 January 2022 - Dask on Azure
01 January 2022 - Dask on AWS
01 January 2022 - Dask for Parallel Python
01 January 2022 - Dask and the PyData Stack
01 January 2022 - Dask Read Parquet Files into DataFrames with read_parquet
01 January 2022 - Creating Disk Partitioned Lakes with Dask using partition_on
01 January 2022 - Cost Savings with Dask and Coiled
01 January 2022 - Convert Large JSON to Parquet with Dask
01 January 2022 - Coiled, one year in
01 January 2022 - Coiled Cloud Architecture
01 January 2022 - Code Formatting Jupyter Notebooks with Black
01 January 2022 - Better Shuffling in Dask: a Proof-of-Concept
01 January 2022 - Automate your ETL Jobs in the Cloud with Github Actions, S3 and Coiled
01 January 2022 - Accelerating Microstructural Analytics with Dask and Coiled
01 January 2022 - Abalone Bio: Accelerating Antibody Discovery
Posted in 2021
22 November 2021 - Pandas parallel apply and map with Dask DataFrame
01 October 2021 - Converting a Dask DataFrame to a pandas DataFrame