Machine Learning with Coiled

Coiled enables Machine Learning by making it easy to get and run Python code on lots of cloud hardware.

This page might interest you if …

You want to do …

  • Model training on GPU machines, or …

  • Batch prediction on lots of data, or …

  • Enhance your local dev environment with remote GPUs

Pain you’ve felt …

  • Setting up GPUs with Docker is hell

  • Training is expensive without metrics

  • Canned systems (like Sagemaker/Colab) feel constraining

Coiled enhances your ML workflows by making it really easy to get cloud GPU machines that are immediately configured like your current machine. Coiled’s approach …

  • Reduces setup pain: by automatically configuring your cloud machines exactly like your current machine (but with more hardware / GPUs)

  • Lets you scale easily: with short boot times and autoscaling that lets you process large volumes of data efficiently

  • Gives visibility: with intuitive dashboards, user/cost controls, and sensible cloud configuration.

coiled run python train.py \   # Run your local scripts
    --vm-type g5.8xlarge \     # Specify any GPU instance type
    --region us-west-2         # Run in any region
@coiled.function(
    vm_type="g5.8xlarge",       # Specify any GPU instance type
    region="us-west-2",         # Run in any region
)
def predict(filename):
    ...

predict.map(filenames)          # Scale out to many machines
$ coiled notebook start \
    --vm-type g5.8xlarge \   # Specify any GPU instance type
    --region us-west-2 \     # Run in any region
    --sync                   # Sync your local files

In the rest of this page we briefly summarize the common workflows that we see where easy-to-access cloud hardware can make the process of Machine Learning more effective and enjoyable.

  1. Interactive development

  2. Train models

  3. Batch prediction

Interactive development

When experimenting with new models or exploring new data you often want to work interactively, either with a Jupyter notebook, or with some other interactive development environment like VSCode or IPython. Coiled can help enhance this experience with cloud hardware with minimal disruption.

GPU Jupyter Notebooks

Coiled can launch a GPU-enabled Jupyter Notebook for you, copy over all of your local files, and copy over your local software packages while also configuring them for use with GPUs.

$ coiled notebook start \
    --vm-type g5.xlarge \   # Use a GPU
    --region us-west-2 \    # Close to your data
    --sync                  # Sync your local files

This has a few benefits:

  • GPU access: You’ll get temporary access to a GPU machine for as long as the notebook is running. When you’re done, it’ll turn off to minimize costs.

  • Software synchronization: You’ll get all of your current Python packages, including the same version of libraries like PyTorch, but those libraries will be properly configured (with CUDA drivers and everything) to operate on the underlying GPU. You don’t need to mess around with Docker.

  • File synchronization: You’ll also get all of your local files, including your notebooks, live-synced up to the cloud machine running Jupyter. Additionally, any changes you make in the notebook on the cloud machine will be synchronized back down to your computer, saving all of your changes on your local hard drive.

This creates a seamless experience for interactive work. For more information, see the Coiled notebook documentation.

For a fully worked example, see Example: Jupyter with a GPU

Interactive Serverless Functions

If you don’t want to use Jupyter, but do want to work interactively with GPUs then we recommend wrapping your code in a Coiled Function that targets GPU hardware.

import coiled
import torch

@coiled.function(
    vm_type="g5.xlarge",     # A10G GPU on AWS
    keepalive="20 minutes",  # Keep VM warm for 20 minutes
)
def train():
    ...
    return model.to("cpu")   # Send model back to your laptop for storage

model = train()

This code runs in any IDE (VSCode, IPython, Vim, …) and runs this specific function on a GPU machine. That machine will be kept up during your Python session, and between Python sessions for 20 minutes, so that it’s easy to iterate quickly without waiting for a new machine to start up every time.

We do have to be careful about returning a CPU-based model from our cloud function so that it can land comfortably when we call it from our CPU development environment.

For a fully worked example, see Example: GPU PyTorch with Serverless Functions

Train Models

Interactive workflows are great for experimentation, but eventually you figure out what you want to run and you put it into a .py file or script so that it can be run on demand or regularly:

python train.py

To run this same script on a cloud machine, use coiled run:

coiled run --vm-type g5.xlarge python train.py

As with interactive workloads, this:

  • Creates ephemeral cloud machines so that you can get GPU resources when you need them, and shut them down immediately afterwards.

  • Synchronizes software including GPU libraries like PyTorch, but properly configured for the underlying GPU. This also grabs any custom or local Python packages and Python files you have in your working directory.

For more information, see the Coiled Run documentation.

For a fully worked example, see Example: Train a GPU-accelerated PyTorch model

Hyperparameter Optimization

For many people, basic hyperparameter optimization suffices when training models, as in this simple example with Coiled Functions:

import coiled

@coiled.function(
    vm_type="g5.xlarge"              # Specify GPU VM type
)
def train_and_score(learning_rate):  # Write Python locally on your machine
    ...
    return score

learning_rates = [...]
scores = train_and_score.map(learning_rates)  # Autoscale across many VMs
best = max(scores)

Alternatively, state of the art HPO libraries, like Optuna, work well with Dask and Coiled, as in this more sophisticated example:

import coiled
import optuna

def objective(trial):              # Define arbitrary objective function
    ...                            # Do any custom logic you like in a black box
    return result                  # Return score

with coiled.Cluster(...) as cluster:
    with cluster.get_client() as client:
        study = optuna.create_study(
            ...
            storage=optuna.integration.DaskStorage(),
        )
        ...

For a fully worked example, see Example: HPO with Optuna

Experiment Tracking

Throughout the lifecycle of a model, but especially during training, tracking and visualizing model results is crucial. You can use MLOps platforms like MLflow, Weights & Biases, or Neptune on Coiled to keep track of your experiments, even when running on cloud machines.

import coiled
import mlflow

@coiled.function(
    vm_type="g5.xlarge",                     # Use GPU Machine
    environ={"MLFLOW_TRACKING_URI": "..."},  # Pass MLflow Credentials
)
def train(params):
    mlflow.log_params(params)                # Log parameters during training
    ...

train({...})                                 # Function evaluates on cloud machine

For a fully worked example, see Example: MLOps with MLflow

Batch Prediction

After you have a nicely trained model, you often want to apply that model across large volumes of data. For this we recommend Coiled Functions, which let you easily map a Python function across many inputs in parallel.

import coiled
import s3fs
from PIL import Image

s3 = s3fs.S3FileSystem()
model_path = "s3://mybucket/mymodel.pt"

@coiled.function(
    vm_type="g5.xlarge",                    # Use GPU Machine
    region="us-east-2",                     # Close to where data lives
)
def predict(filename):
    model = torch.load(...).to("cuda:0")    # Use GPU-Accelerated Model

    with tempfile.TemporaryDirectory() as dir:
        s3.get(filename, dir / "myfile")
        data = Image.open(dir / "myfile")
        return model(data)

filenames = s3.glob("s3://.../*.png")       # Get list of files
results = predict.map(filenames)            # Process all files in parallel

This lets you develop your code locally to make sure that it works well on single files, then scale out to lots of data on lots of GPUs when you’re ready. Coiled will automatically adaptively scale to enough machines to finish your job in a few minutes, assuming availability exists in your region.

For more information, see the Coiled Functions documentation.

Best Practices and Common Issues

GPU availability

GPUs are in high demand these days, and so can be hard to find, leading to availability issues. There are two things you can do to help address this:

  • Avoid the larger GPU instance types, if you can, in particular A100 and H100 types. These GPUs have the large amounts of memory that are necessary to run LLMs, and so are in particularly high demand. If you can get away with it we recommend using smaller GPUs, like the A10’s available in the g5.xlarge instance type used above, which are generally more available.

  • Search Different Regions where GPUs may be more or less available. You should avoid large amounts of cross-region data transfer (this can quickly become expensive) but if you’re not moving around large volumes of data then trying different regions can open you up to more availability than would otherwise be possible. To do this use the region= keyword for @coiled.function or coiled.Cluster.

Use S3 for data, not your local hard drive

You can keep your files and notebooks and models on your local drive (Coiled is good about synchronizing these to cloud machines) but it’s usually best to keep your data in the cloud, especially when it’s large. Fortunately, with libraries like s3fs, gcsfs, and the AWS CLI this is pretty straightforward.

import s3fs

s3 = s3fs.S3FileSystem()
s3.get("s3://mybucket/myfile", "./local/file")

# or

with s3.open("s3://mybucket/myfile", mode="rb") as f:
    data = f.read()

For more information, read Connect to remote data in the Dask documentation.

Third-party platform authentication

Commonly used platforms like Hugging Face, MLflow, etc. have their own authentication systems for accessing private assets. When using these platforms on Coiled, cloud VMs also need to be authenticated with the platform. Most platform support authentication through environment variables (e.g. Hugging Face uses HF_TOKEN). We recommend using the environ= keyword in @coiled.function or coiled.Cluster to securely pass authentication information to Coiled cloud VMs.

import coiled
from transformers import pipeline

@coiled.function(environ={"HF_TOKEN": <your-token>})
def train(file):
    transcriber = pipeline(task="automatic-speech-recognition")
    ...
    return result