Posted in 2023

21 December 2023 - Xarray at Large Scale: A Beginner’s Guide

17 November 2023 - Process Hundreds of GB of Data in the Cloud with Polars

01 November 2023 - Processing Terabyte-Scale NASA Cloud Datasets with Coiled

10 October 2023 - Run Jupyter Notebooks on a GPU on the Cloud

06 October 2023 - Ten Cents Per Terabyte

05 October 2023 - TPC-H Benchmarks for Query Optimization with Dask Expressions

19 September 2023 - Coiled observability wins: Chunksize

07 September 2023 - Parallel Serverless Functions at Scale

05 September 2023 - Processing a 250 TB dataset with Coiled, Dask, and Xarray

01 September 2023 - Reduce training time for CPU intensive models with scikit-learn and Coiled Functions

23 August 2023 - Fine Performance Metrics and Spans

10 August 2023 - Data-proximate Computing with Coiled Functions

09 August 2023 - Dask, Dagster, and Coiled for Production Analysis at OnlineApp

07 August 2023 - Process Hundreds of GB of Data with DuckDB in the Cloud

04 August 2023 - High Level Query Optimization in Dask

01 August 2023 - Easy Heavyweight Serverless Functions

24 July 2023 - How to Train a Neural Network on a GPU in the Cloud with coiled functions

23 June 2023 - Dask performance benchmarking put to the test: Fixing a pandas bottleneck

14 June 2023 - Coiled notebooks

05 June 2023 - Utilizing PyArrow to improve pandas and Dask workflows

18 May 2023 - Distributed printing

16 May 2023 - Observability for Distributed Computing with Dask

15 May 2023 - GIL monitoring in Dask

05 May 2023 - Performance testing at Coiled

05 May 2023 - How well does Dask run on Graviton?

18 April 2023 - Upstream testing in Dask

04 April 2023 - Burstable vs non-burstable AWS instance types for data engineering workloads

15 March 2023 - Shuffling large data at constant memory in Dask

23 February 2023 - Just in time Python environments

17 January 2023 - How many PEPs does it take to install a package?

06 January 2023 - Scaling Hyperparameter Optimization With XGBoost, Optuna, and Dask

06 January 2023 - Handling Unexpected AWS IAM Changes

06 January 2023 - AWS Cost Explorer Tips and Tricks

Posted in 2022

19 December 2022 - Automated Data Pipelines On Dask With Coiled & Prefect

09 February 2022 - Reading CSV files into Dask DataFrames with read_csv

01 January 2022 - Writing Parquet Files with Dask using to_parquet

01 January 2022 - Why we passed on Kubernetes

01 January 2022 - Use Mambaforge to Conda Install PyData Stack on your Apple M1 Silicon Machine

01 January 2022 - Understanding Managed Dask (Dask as a Service)

01 January 2022 - Tackling unmanaged memory with Dask

01 January 2022 - Speed up a pandas query 10x with these 6 Dask DataFrame tricks

01 January 2022 - Spark to Dask: The Good, Bad, and Ugly of Moving from Spark to Dask

01 January 2022 - Snowflake and Dask: a Python Connector for Faster Data Transfer

01 January 2022 - Seven Stages of Open Software

01 January 2022 - Setting a Dask DataFrame index

01 January 2022 - Search at Grubhub and User Intent

01 January 2022 - Scikit-learn + Joblib: Scale your Machine Learning Models for Faster Training

01 January 2022 - Scale your data science workflows with Python and Dask

01 January 2022 - Save Money with Spot

01 January 2022 - Repartitioning Dask DataFrames

01 January 2022 - Reducing memory usage in Dask workloads by 80%

01 January 2022 - Reduce memory usage with Dask dtypes

01 January 2022 - PyArrow Strings in Dask DataFrames

01 January 2022 - Prioritizing Pragmatic Performance for Dask

01 January 2022 - Perform a Spatial Join in Python

01 January 2022 - Introducing the Dask Active Memory Manager

01 January 2022 - How to Merge Dask DataFrames

01 January 2022 - How to Convert a pandas Dataframe into a Dask Dataframe

01 January 2022 - How Coiled sets memory limit for Dask workers

01 January 2022 - Filtering Dask DataFrames with loc

01 January 2022 - Enterprise Dask Support

01 January 2022 - Easily Run Python Functions in Parallel

01 January 2022 - Double River: Enhanced Algorithmic Trading Performance

01 January 2022 - Dask on GCP

01 January 2022 - Dask on Azure

01 January 2022 - Dask on AWS

01 January 2022 - Dask for Parallel Python

01 January 2022 - Dask and the PyData Stack

01 January 2022 - Dask Read Parquet Files into DataFrames with read_parquet

01 January 2022 - Creating Disk Partitioned Lakes with Dask using partition_on

01 January 2022 - Cost Savings with Dask and Coiled

01 January 2022 - Convert Large JSON to Parquet with Dask

01 January 2022 - Coiled, one year in

01 January 2022 - Coiled Cloud Architecture

01 January 2022 - Code Formatting Jupyter Notebooks with Black

01 January 2022 - Better Shuffling in Dask: a Proof-of-Concept

01 January 2022 - Automate your ETL Jobs in the Cloud with Github Actions, S3 and Coiled

01 January 2022 - Accelerating Microstructural Analytics with Dask and Coiled

01 January 2022 - Abalone Bio: Accelerating Antibody Discovery