Best Practices#

Cloud computing is incredibly powerful when done well. Coiled makes cloud computing easy, but doing things well still requires some experience.

This page contains suggestions for Coiled best practices and includes solutions to common Coiled problems.

Run computations close to your data#

Cloud computing can be surprisingly cheap when done well. When processing large amounts of cloud-hosted data, it’s important to have your compute VMs in the same cloud region where your data is hosted. This avoids expensive data transfer costs that scale with the size of your data.

Avoid data transfer costs by using the region= option to provision your cloud VMs in the same region as the data you’re processing.

# Create Dask cluster in `us-west-2`
import coiled
cluster = coiled.Cluster(region="us-west-2", ...)
client = cluster.get_client()

# Load dataset that lives in `us-west-2`
import dask.dataframe as dd
df = dd.read_parquet("s3://...")

Create a fresh software environment#

By default Coiled inspects your local environment for Python packages and replicates that same environment on cloud VMs. This usually works well but can run into issues when your local software environment has inconsistent versions of packages installed (as tends to happen with long-lived environments that evolve organically over time).

In these cases, creating a new, fresh local software environment with the libraries you need installed almost always resolves package consistency issues.

Use conda for non-Python libraries#

By default Coiled inspects your local environment for Python packages and replicates that same environment on cloud VMs. For pure Python packages installed with pip or conda, this usually works great. However, for more complex libraries that involve installing additional system packages like gdal, graphviz, etc., automatic package synchronization can fail if those additional system packages weren’t installed with pip or conda (e.g. apt or brew were used instead).

In these cases, we recommend using conda to install more complex libraries as Coiled’s package synchronization will handle these properly.

Using graphviz as an example, replace this:

brew install graphviz  # Install system graphviz
pip install graphviz   # Install Python bindings

with this:

conda install python-graphviz  # Install system graphviz and Python bindings

Use Dask best practices#

When using Dask on Coiled, continue to use normal Dask best practices:

Set your default Coiled workspace#

You can use the workspace= option to switch between different Coiled accounts:

import coiled
cluster_dev = coiled.Cluster(workspace="company-dev", ...)
cluster_prod = coiled.Cluster(workspace="company-prod", ...)

However, it can be easy to forget to do this, especially for new users who were recently added to an existing account.

For users that use multiple Coiled accounts, we recommend setting your default account from your profile page to be the account you use most often.

See Manage Users for more information on managing Coiled accounts.

GPU availability#

GPUs are in high demand these days, and so can be hard to find, leading to availability issues. There are two things you can do to help address this:

  • Avoid the larger GPU instance types, if you can, in particular A100 and H100 types. These GPUs have the large amounts of memory that are necessary to run LLMs, and so are in particularly high demand. If you can get away with it we recommend using smaller GPUs, like the A10’s available in the g5.xlarge instance type used above, which are generally more available.

  • Search Different Regions where GPUs may be more or less available. You should avoid large amounts of cross-region data transfer (this can quickly become expensive) but if you’re not moving around large volumes of data then trying different regions can open you up to more availability than would otherwise be possible. To do this use the region= keyword for @coiled.function or coiled.Cluster.

Use S3 for data, not your local hard drive#

You can keep your files and notebooks and models on your local drive (Coiled is good about synchronizing these to cloud machines) but it’s usually best to keep your data in the cloud, especially when it’s large. Fortunately, with libraries like s3fs, gcsfs, and the AWS CLI this is pretty straightforward.

import s3fs

s3 = s3fs.S3FileSystem()
s3.get("s3://mybucket/myfile", "./local/file")

# or

with s3.open("s3://mybucket/myfile", mode="rb") as f:
    data = f.read()

For more information, read Connect to remote data in the Dask documentation.

Third-party platform authentication#

Commonly used platforms like Hugging Face, MLflow, etc. have their own authentication systems for accessing private assets. When using these platforms on Coiled, cloud VMs also need to be authenticated with the platform. Most platform support authentication through environment variables (e.g. Hugging Face uses HF_TOKEN. We recommend using the environ= keyword in @coiled.function or coiled.Cluster to securely pass authentication information to Coiled cloud VMs.

import coiled
from transformers import pipeline

@coiled.function(environ={"HF_TOKEN": <your-token>})
def train(file):
    transcriber = pipeline(task="automatic-speech-recognition")
    ...
    return result