Coiled Documentation#

Cloud Computing for Data People

Coiled helps you use Python on the cloud easily and efficiently. Based on Dask for parallel computing, Coiled provides the following APIs.

 

What does this do?#

Dask runs sophisticated Python code and integrates with many libraries. Pandas is shown below, but for more applications see Dask examples.

# Make Dask cluster
import coiled
cluster = coiled.Cluster(
    n_workers=100,
)
client = cluster.get_client()

# Use Dask + Pandas together
import dask.dataframe as dd

df = dd.read_parquet("s3://bucket/lots-of-data.parquet")
df.groupby("name").amount.sum().compute()

Run simple functions on cloud hardware close to your data. Easily scale up with the .map method. For more information and examples see Coiled Functions.

import coiled, random

@coiled.function()
def estimate_pi(n: int) -> float:
    total = 0
    for _ in range(n):
        x = random.random()
        y = random.random()
        if x ** 2 + y ** 2 < 1:
            total += 1
    return total / n * 4

pi = estimate_pi(100_000)
print(pi)

Run an executable on a cloud VM. Dead simple.

See Interactive CLI Jobs for more information.

coiled run echo "Hello, world"

Run Jupyter on large cloud-based VMs. Synchronize your files back to your local hard drive.

For more information see Jupyter Notebooks.

coiled notebook start --sync --vm-type m6i.16xlarge

How does this work?#

Coiled quickly creates cloud VMs that match your local environment. This lets you run on bigger/faster/more hardware, but with the ease and familiarity of normal development.

  1. Code Locally: You write normal Python wherever you do today (like your laptop) and submit that code to run on Coiled.

  2. Launch VMs: Coiled rapidly creates ephemeral VMs to run your code (this takes about a minute).

  3. Environment synchronization: Coiled inspects your machine for packages, scripts, and credentials, and then installs those quickly on your remote machines so that they match your development environment.

  4. Execute and monitor: Your code runs at scale with loads of metrics running in the background to help you debug and optimize.

  5. Robust Cleanup: Everything cleans up when you’re done, leaving you with a clean slate and low costs.

Coiled’s approach of environment scraping and rapid deployment of raw VMs gives a compute stack that endeavors to be easy, powerful, and cheap.

Examples & Use Cases#

Machine Learning

Train, predict, and track on cloud hardware with Coiled.

Dask vs. Spark

Dask is faster and easier to use than Spark.

Production ETL

Run lightweight data pipelines on a schedule on the cloud.

Geospatial

Process TBs of geospatial data with Coiled and Xarray.

GPU Jupyter Notebook

Easily run Jupyter notebooks on cloud hardware.

General Python

Parallelize custom Python functions on the cloud.