Best Practices¶
Cloud computing is incredibly powerful when done well. Coiled makes cloud computing easy, but doing things well still requires some experience.
This page contains suggestions for Coiled best practices and includes solutions to common Coiled problems.
Run computations close to your data¶
Cloud computing can be surprisingly cheap when done well. When processing large amounts of cloud-hosted data, it’s important to have your compute VMs in the same cloud region where your data is hosted. This avoids expensive data transfer costs that scale with the size of your data.
Avoid data transfer costs by using the region=
option to provision your cloud VMs in the same region
as the data you’re processing.
# Create Dask cluster in `us-west-2`
import coiled
cluster = coiled.Cluster(region="us-west-2", ...)
client = cluster.get_client()
# Load dataset that lives in `us-west-2`
import dask.dataframe as dd
df = dd.read_parquet("s3://...")
Create a fresh software environment¶
By default Coiled inspects your local environment for Python packages and replicates that same environment on cloud VMs. This usually works well but can run into issues when your local software environment has inconsistent versions of packages installed (as tends to happen with long-lived environments that evolve organically over time).
In these cases, creating a new, fresh local software environment with the libraries you need installed almost always resolves package consistency issues.
Use conda for non-Python libraries¶
By default Coiled inspects your local environment for Python packages and replicates that same
environment on cloud VMs. For pure Python packages installed with pip
or conda
, this usually works great.
However, for more complex libraries that involve installing additional system packages like gdal
,
graphviz
, etc., automatic package synchronization can fail if those
additional system packages weren’t installed with pip
or conda
(e.g. apt
or brew
were
used instead).
In these cases, we recommend using conda
to install more complex libraries as Coiled’s package
synchronization will handle these properly.
Using graphviz
as an example, replace this:
brew install graphviz # Install system graphviz
pip install graphviz # Install Python bindings
with this:
conda install python-graphviz # Install system graphviz and Python bindings
Use Dask best practices¶
When using Dask on Coiled, continue to use normal Dask best practices:
Set your default Coiled workspace¶
You can use the workspace=
option to switch between different Coiled accounts:
import coiled
cluster_dev = coiled.Cluster(workspace="company-dev", ...)
cluster_prod = coiled.Cluster(workspace="company-prod", ...)
However, it can be easy to forget to do this, especially for new users who were recently added to an existing account.
For users that use multiple Coiled accounts, we recommend setting your default account from your profile page to be the account you use most often.
See Manage Users for more information on managing Coiled accounts.