When performing computations across a distributed cluster on the cloud, there are a few things to keep in mind to help you be as effective as possible. This page contains suggestions for some best practices when using Coiled.
Use the same software locally and on Coiled¶
When using Coiled you’ll connect your local machine (e.g. your laptop) to a remote Dask cluster. It’s important that both your local and remote software environments have the same libraries installed in them. Version mismatches between these two environments can sometimes yield unexpected errors.
With this in mind, Coiled has a
coiled install command line utility
(documentation) which you can run locally to create
a conda environment on your machine with the same packages in a software
environment used locally. Run the following locally in a terminal:
$ coiled install <software-environment-name>
<software-environment-name> should be replaced with the name of the
Coiled software environment you want to install on your machine.
You can check whether or not the version of a package you have locally is the
same version that’s being used remotely on a Coiled cluster using Dask’s
import coiled from dask.distributed import Client # Create a Coiled cluster and connect a Dask Client to it cluster = coiled.Cluster(...) client = Client(cluster) # This will raise a ValueError if package versions do not match client.get_versions(packages=["xgboost", "numpy"], check=True)
Look at cluster logs¶
Coiled clusters keep track of various events that happen throughout a cluster’s
lifetime through maintaining a log on the scheduler and workers in the cluster.
These cluster logs provide insight into cluster activity and are particularly
useful when debugging tricky situations (e.g. when a
error is raised).
You can view logs for your Coiled clusters by clicking on the “View logs” button on the cluster dashboard page.
Use Dask best practices¶
The Dask community maintains a general Dask best practices page, as well as separate best practices pages for working with individual Dask collections:
It’s generally recommended to use Dask best practices with your Coiled workloads.