How to get local code on your cluster

The easiest way to upload your local Python code to your Coiled cluster is to package it and let Automatic Package Synchronization handle building the wheel and installing it on your cluster.

All importable Python packages installed in your environment are synced (e.g. pip install -e, python setup.py install, python setup.py develop or similar). However, importable Python modules in your local directory are not synced (e.g. folders with __init__.py or various Python files).

This guide will go through:

  • How to turn your local Python code into an installable package

  • Confirming your package can be installed in your cluster

Turning Python code into a package

In this section, you’ll learn how to package a simple Python project. This guide was adapted from this PyPA tutorial and omits steps required to upload your package to PyPI, which is not required for using package sync.

Example project

This guide uses an example project named my_package. Create the following file structure locally:

my_project/
└── src/
    └── my_package/
        ├── __init__.py
        └── my_module.py

Open my_module.py and enter the following:

def inc(x):
    return x + 1

The src directory should contain all modules and packages meant for distribution (see the setuptools documentation on src-layout).

__init__.py is required to import the directory as a package, and should be empty.

my_module.py is an example of a module in the package that could contain classes, functions, etc.

For more background on Python modules and importing packages (ie, why you need __init__.py), see the Python documentation on packages and modules.

Creating pyproject.toml

Note

Projects utilizing the deprecated setup.py style for packaging are still supported by package sync

Create the pyproject.toml file in the top level of your my_project directory. Your file structure should now look like this:

my_project/
├── pyproject.toml
└── src/
    └── my_package/
        ├── __init__.py
        └── my_module.py

Open pyproject.toml and enter the following [build-system] and [project] tables:

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "my_package"
version = "0.0.1"

The [build-system] table tells build tools, like pip, which backend tool to use when creating your project. You can choose from a number of backends; this guide uses setuptools, but you can also also use others like Hatchling, Flit, or PDM.

  • requires is a list of packages needed to build your package. You don’t need to install these, the build frontend you’re using, like pip, will automatically do so in a temporary, isolated environment when the package is built.

  • build-backend is the name of the Python object that the build frontend will use to perform the build.

The [project] table includes metadata about your project. name and version are required, though there are a number of keywords you can include (see the Python documentation on declaring project metadata).

  • name is the distribution name of your package. This can be any name as long as it only contains letters, numbers, ., _ , and -. It’s convention for the directory containing your Python files to match the project name.

  • version is the package version. See PyPA version specifiers.

Tip

Why weren’t my Python files included in my package?

A common problem is that you’ve created a package, but your Python files are not included, often manifesting as an ImportError.

To fix this problem, you can use setuptools-specific configuration options like py-modules and packages (see the setuptools documentation). Alternatively, you can use the find directive, which is particularly helpful for projects that don’t follow a src- or flat-layout:

[tool.setuptools.packages.find]
where = ["src"]  # list of folders that contain the packages (["."] by default)
include = ["my_package*"]  # package names should match these glob patterns (["*"] by default)
exclude = ["my_package.tests*"]  # exclude packages matching these glob patterns (empty by default)
namespaces = false  # to disable scanning PEP 420 namespaces (true by default)

Installing your package

Now that you’ve packaged your Python code, you can install my_package into your Python environment. You can do this in a new virtual environment, installing your package from a local src (see the PyPA documentation ).

Note

In this guide, you’re installing my_package locally from src. It is also common to install packages from a version control system, like Git, which is also supported by package sync (see the pip documentation on VCS support).

You can do this either in Development Mode

python3 -m pip install -e my_project

Or normally from src

python3 -m pip install my_project

You’ll also want to have Coiled installed:

pip install coiled 'dask[complete]'

Validation

Import your package locally

Now that you’ve installed my_package, you can start a Python session and confirm you are able to import your package locally:

from my_package.my_module import inc

If there are modules missing from your package, you may get an import error. A good first step is to use pip wheel my_package, extract the resulting .whl file, and inspect the contents (using unzip my_package.whl, for example). You may need to be more explicit about which files to include in your package in your pyproject.toml file, see our tip above.

Import your package in the cloud

Before you start your cluster, you can use the coiled package-sync scan CLI tool to check if it will be included in your environment:

coiled package-sync scan

You should see an output similar to the following. You can see my_package listed at the top. If it were not built correctly, then in the last column you would see “Can build wheel” set to false or the package would not be listed at all.

Table of packages with rows for each Python package and columns for conda name, version, source, wheel target, path, and whether the wheel can be built.

When you create a Coiled cluster, you can see the output from package sync as my_package is built:

cluster = coiled.Cluster(n_workers=10)
client = cluster.get_client()
../../../_images/package-sync-build.png

Now, you can import and call inc in your cluster:

import dask
from my_package.my_module import inc

dask.delayed(inc(10)).compute()

Next steps

For more details on packaging a Python project, we recommend you follow the full PyPA tutorial.