Using custom code on a Coiled cluster#

In this tutorial, you’ll learn more advanced techniques for installing custom Python code on an already running cluster, using a custom Docker image to create a Coiled software environment, or running additional commands after installing a package.

Replicate your local Python environment#

Package sync scans your local Python environment and replicates it on the cluster—even local packages and Git dependencies. It’s easier than building a Docker image, plus it launches significantly faster (see Package Synchronization). Package sync is used by default when you create a cluster.

Install a local Python module#

If you’re not using package sync, but you have custom Python packages you would like to use on your Coiled cluster, you can upload a Python module to all of your workers using Dask’s upload_file. This sends a local file to all your worker nodes on an already running cluster. This method is particularly helpful for when you would like to use custom Python functions that haven’t been packaged in your Cluster.

import coiled
from dask.distributed import Client

cluster = coiled.Cluster()
client = Client(cluster)


Docker images#

In addition to Package Synchronization, you can also create a Coiled Python environment using a Docker image. Coiled supports images on Docker Hub, Amazon ECR or Google Artifact Registry.


Your ability to use private images stored in Docker Hub or a cloud provider-specific registry is limited by which option you chose when initially setting up your Coiled account (see the Container Registry step for Google Cloud or AWS).

For example, if you chose to store your Coiled software environments in ECR, then you will not be able to use private Docker Hub images. If you would like to be able to use both Docker Hub and ECR reach out to us for help.

First, you’ll create a Coiled Python environment using the container= keyword argument of of coiled.create_software_environment(). For example, to create an environment using the publicly available RAPIDS image on Docker Hub:


For Amazon ECR or Google Artifact Registry, you can pass the full registry URL; for Amazon ECR, for example, this would resemble


To test that your container will run successfully on your Coiled cluster, you can run:

docker run --rm <your_container> python -m distributed.cli.dask_spec \
  --spec '{"cls":"dask.distributed.Scheduler", "opts":{}}'

If successful, this will start the dask.distributed scheduler (you can use CTRL+C to stop it). For example:

> docker run --rm daskdev/dask:latest python -m distributed.cli.dask_spec \
    --spec '{"cls":"dask.distributed.Scheduler", "opts":{}}'

2022-10-06 14:44:43,640 - distributed.scheduler - INFO - State start
2022-10-06 14:44:43,656 - distributed.scheduler - INFO - Clear task state
2022-10-06 14:44:43,658 - distributed.scheduler - INFO -   Scheduler at:    tcp://
2022-10-06 14:44:43,658 - distributed.scheduler - INFO -   dashboard at:                     :8787

If not, you will see an error like /opt/conda/bin/python: Error while finding module specification for 'distributed.cli.dask_spec' (ModuleNotFoundError: No module named 'distributed'). For example:

> docker run --rm continuumio/miniconda3:latest python -m distributed.cli.dask_spec \
    --spec '{"cls":"dask.distributed.Scheduler", "opts":{}}'

Unable to find image 'continuumio/miniconda3:latest' locally
latest: Pulling from continuumio/miniconda3
dc1f00a5d701: Already exists
a7a9c78d89b2: Already exists
44ac19016d77: Already exists
Digest: sha256:977263e8d1e476972fddab1c75fe050dd3cd17626390e874448bd92721fd659b
Status: Downloaded newer image for continuumio/miniconda3:latest
/opt/conda/bin/python: Error while finding module specification for 'distributed.cli.dask_spec' (ModuleNotFoundError: No module named 'distributed')

If the dask.distributed scheduler fails to start, it’s good to check that distributed is installed and the environment it is installed in has been activated. If you’re having trouble running your Docker container on your Coiled cluster, feel free to reach out to us for help.

Install pip-installable packages#

If you have a package that is pip-installable, but not yet publicly available on PyPI or conda-forge, for example, you can use Dask’s PipInstall Worker Plugin to pip install a set of packages. This is particularly useful for uploading modules that are still in development.

You can upload a public module in GitHub

from dask.distributed.diagnostics.plugin import PipInstall

plugin = PipInstall(packages=["git+<github url>"])
client.register_worker_plugin(plugin, name="<dependency name>")

If you want to install from a private repository you need to have a GitHub token set in your account by either having signed up with GitHub or by adding your GitHub token to your profile. GitHub tokens are stored with Coiled and then used on the machine that’s building the software environment; the token is not saved in the software environment.

from dask.distributed.diagnostics.plugin import PipInstall

plugin = PipInstall(packages=["git+https://{GITHUB_TOKEN}<repo>"])
client.register_worker_plugin(plugin, name="<dependency name>")


Using the name= argument will allow you to call PipInstall more than once, otherwise you might see a message from workers like {'tls://': {'status': 'repeat'}.

Upload a local directory#

Similar to the PipInstall Plugin, you can upload a local directory to your cluster by using the UploadDirectory Nanny Plugin.

You can upload a local directory from your machine to the cluster using:

from distributed.diagnostics.plugin import UploadDirectory

client.register_worker_plugin(UploadDirectory("/path/to/directory"), nanny=True)

It’s worth noting UploadDirectory will not install anything on its own. Ideally, you would package the code and directly use the PipInstall Worker Plugin mentioned above. However, if this is not possible you can write your own plugin for uploading and installation:

class InstallModulePlugin(UploadDirectory):
"""Use this plugging to upload a directory and install that directory in the workers."""
def __init__(self, dir_path, module):
    """Initializes the plugin

    dir_path: str, path to the directory you want to upload
    module: directory name
    super().__init__(dir_path, update_path=True)
    self.module = module

async def setup(self, nanny):
    await super().setup(nanny)

    import os
    import subprocess
    path_to_module = os.path.join(nanny.local_directory, self.module)

    # or whatever bash command to install package["pip", "install", "-e", path_to_module])

plugin = InstallModulePlugin("path_to_directory", "directory_name")
client.register_worker_plugin(plugin, nanny=True)