Coiled User Guide#

Cloud Computing for Data People

Coiled helps you use Python on the cloud easily and efficiently. Coiled provides tools that let you:

Get Started

Meet the Tools#

With Batch Jobs, run any code (doesn’t need to be Python) in the cloud from the comfort of your terminal.

$ coiled batch run --memory 64GB --region us-west-2 my_script.sh

With Coiled Functions, decorate a function to run it in the cloud. Scale out with the .map method.

import coiled
import pandas as pd

@coiled.function(region="us-west-2")  # Run close to the data
def process(filename):
   output_filename = filename[:-4] + ".parquet"
   df = pd.read_csv(filename)
   df.to_parquet(output_filename)
   return output_filename

# result = process("s3://my-bucket/data.parquet")  # one file

results = process.map(filenames)   # many files in parallel
for filename in results:
   print("Finished", filename)

With Coiled Clusters, deploy and scale a Dask cluster. Pandas is shown below, but for more applications see Dask examples.

import coiled
cluster = coiled.Cluster(
    n_workers=100,
)
client = cluster.get_client()

# Use Dask + Pandas together
import dask.dataframe as dd

df = dd.read_parquet("s3://bucket/lots-of-data.parquet")
df.groupby("name").amount.sum().compute()

Run Jupyter on large cloud-based VMs. Synchronize your files back to your local hard drive.

coiled notebook start --sync --vm-type m6i.16xlarge

How Does This Work?#

Coiled quickly creates cloud VMs that match your local environment. This lets you run on bigger/faster/more hardware, but with the ease and familiarity of normal development.

  1. Code Locally: You write normal Python wherever you do today (like your laptop) and submit that code to run on Coiled.

  2. Launch VMs: Coiled rapidly creates ephemeral VMs to run your code (this takes about a minute).

  3. Environment synchronization: Coiled inspects your machine for packages, scripts, and credentials, and then installs those quickly on your remote machines so that they match your development environment.

  4. Execute and monitor: Your code runs at scale with loads of metrics running in the background to help you debug and optimize.

  5. Robust Cleanup: Everything cleans up when you’re done, leaving you with a clean slate and low costs.

Coiled’s approach of environment scraping and rapid deployment of raw VMs gives a compute stack that endeavors to be easy, powerful, and cheap.