Best Practices

Cloud computing is incredibly powerful when done well. Coiled makes cloud computing easy, but doing things well still requires some experience.

This page contains suggestions for Coiled best practices and includes solutions to common Coiled problems.

Run computations close to your data

Cloud computing can be surprisingly cheap when done well. When processing large amounts of cloud-hosted data, it’s important to have your compute VMs in the same cloud region where your data is hosted. This avoids expensive data transfer costs that scale with the size of your data.

Avoid data transfer costs by using the region= option to provision your cloud VMs in the same region as the data you’re processing.

# Create Dask cluster in `us-west-2`
import coiled
cluster = coiled.Cluster(region="us-west-2", ...)
client = cluster.get_client()

# Load dataset that lives in `us-west-2`
import dask.dataframe as dd
df = dd.read_parquet("s3://...")

Create a fresh software environment

By default Coiled inspects your local environment for Python packages and replicates that same environment on cloud VMs. This usually works well but can run into issues when your local software environment has inconsistent versions of packages installed (as tends to happen with long-lived environments that evolve organically over time).

In these cases, creating a new, fresh local software environment with the libraries you need installed almost always resolves package consistency issues.

Use conda for non-Python libraries

By default Coiled inspects your local environment for Python packages and replicates that same environment on cloud VMs. For pure Python packages installed with pip or conda, this usually works great. However, for more complex libraries that involve installing additional system packages like gdal, graphviz, etc., automatic package synchronization can fail if those additional system packages weren’t installed with pip or conda (e.g. apt or brew were used instead).

In these cases, we recommend using conda to install more complex libraries as Coiled’s package synchronization will handle these properly.

Using graphviz as an example, replace this:

brew install graphviz  # Install system graphviz
pip install graphviz   # Install Python bindings

with this:

conda install python-graphviz  # Install system graphviz and Python bindings

Use Dask best practices

When using Dask on Coiled, continue to use normal Dask best practices:

Set your default Coiled workspace

You can use the workspace= option to switch between different Coiled accounts:

import coiled
cluster_dev = coiled.Cluster(workspace="company-dev", ...)
cluster_prod = coiled.Cluster(workspace="company-prod", ...)

However, it can be easy to forget to do this, especially for new users who were recently added to an existing account.

For users that use multiple Coiled accounts, we recommend setting your default account from your profile page to be the account you use most often.

See Manage Users for more information on managing Coiled accounts.