Manual Software Environments#
Tip
We recommend using Coiled’s default Package Sync for managing software. In cases where package synchronization doesn’t work, you can instead recreate your local environment in the cloud using explicit software environments, which are discussed in this section.
Summary#
You can explicitly specify which packages to use in your cluster as in the following example:
import coiled
# Create software environment
coiled.create_software_environment(
name="my-software-environment",
conda={
"channels": ["conda-forge"],
"dependencies": ["python", "dask", "pyarrow"],
},
pip=["toolz"],
)
# Create cluster using that software environment
cluster = coiled.Cluster(
software="my-software-environment",
n_workers=10,
)
client = cluster.get_client()
Software environments help you to construct exactly which packages you want installed, name those environments for later use, and then explicitly specify which environment you want for a specific cluster.
Create software environments#
You can create your Python environment by installing dependencies from conda, pip, or by specifying a pre-built Docker image.
Conda#
You can create a conda environment using the conda
keyword argument. You can pass a list of dependencies, for example:
coiled.create_software_environment(
name="my-conda-env", conda=["python=3.9", "dask", "coiled", "xarray"]
)
coiled.create_software_environment(
name="my-conda-env",
conda={
"channels": ["conda-forge", "defaults"],
"dependencies": ["python=3.9", "dask", "coiled", "xarray"],
},
)
coiled.create_software_environment(
name="my-conda-env",
conda="/path/to/environment.yml",
)
Where /path/to/environment.yml
is a local file that might look something like
(see the conda documentation on how to export your environment.yml file):
# environment.yml
channels:
- conda-forge
- defaults
dependencies:
- python==3.9
- dask==2023.2.0
- bokeh==2.4.3
- numba
Tip
When creating an environment with conda
it’s important to specify the Python version, otherwise, the highest supported version will be used.
Pip#
Similarly, you can use the pip
keyword argument to install dependencies using pip. You can pass a list of dependencies, for example:
coiled.create_software_environment(
name="my-pip-env",
pip=["dask[complete]", "coiled", "xarray"],
)
coiled.create_software_environment(
name="my-pip-env",
pip="requirements.txt",
)
where requirements.txt
might look something like:
bokeh==2.4.3
click==8.1.3
cloudpickle==2.2.1
dask==2023.2.0
distributed==2023.2.0
fsspec==2023.1.0
Or you can pass a local requirements.txt file (see the pip documentation for more information on requirements files):
Note
When creating an environment with pip
, your Python version will be detected automatically and used in your cluster.
Note
Pip does not automatically install distributed
along with dask
.
Specify dask with dask[complete]
or dask[distributed]
to ensure
distributed is installed.
Private Repositories#
To use pip packages hosted in private repositories you must add a personal access token to your Coiled profile, which allows Coiled to pip install these packages on your behalf. To create a GitHub personal access token, follow the steps in GitHub’s guide. After you’ve created your access token, add it to your profile page at https://cloud.coiled.io/profile.
When specifying a pip package from a private repository use the format:
git+https://GIT_TOKEN@github.com/<github_account>/<github_repo>.git
For example:
coiled.create_software_environment(
name="my-pip-env",
pip=[
"dask[complete]",
"git+https://GIT_TOKEN@github.com/coiled/private_package.git",
],
)
Attention
For security reasons, you should not use your actual personal access
token when specifying pip requirements. Instead, use the literal string
GIT_TOKEN
which acts as a placeholder for your personal access token.
Your actual access token will be populated when Coiled builds the
corresponding software environment.
Docker#
You can also build environments based on Docker images using the container
keyword argument, for example:
coiled.create_software_environment(
name="my-docker-env",
container="nvcr.io/nvidia/rapidsai/base:23.08-cuda11.8-py3.10",
)
will build a software environment named “my-docker-env” using the latest RAPIDS image. See Docker for more details.
GPU#
Manual software environments are particularly useful when launching jobs that use GPU-accelerated libraries, for example CuPy, that require a GPU for installation, from a machine without a GPU.
In these cases, setting gpu_enabled=True
is required for setting the
CUDA version for conda. You can use any set of libraries that works for you,
as long as the following are installed:
dask
,distributed
are required for creating Dask clusters. Installable via conda or pip (see the Dask documentation).cudatoolkit >= 11.0
is required for low-level compute optimization. Installable via conda-forge and nvidia (see the NVIDIA documentation).pynvml
(optional) allows GPU metrics to appear in the Dask scheduler dashboard. Installable via conda or pip (see PyPI).
Here’s an example of a GPU-enabled environment for working with CuPy, cuML, and Optuna. This ensures GPU-versions of packages will be installed.
import coiled
coiled.create_software_environment(
name="cuml",
conda={
"channels": ["rapidsai", "nvidia", "conda-forge"],
"dependencies": [
"python=3.11",
"dask",
"cupy",
"cuml",
"optuna",
"cudatoolkit",
"pynvml",
],
},
gpu_enabled=True, # sets CUDA version for conda
)
Manage Software Environments#
Update#
You can update an existing software environment by calling create_software_environment
with the name of the software environment you want to update and the new specification for the software environment.
If the inputs to create_software_environment
have changed since the last time it was called for a given software environment, the corresponding software environment will be rebuilt using the new inputs and any future uses of the software environment will use the updated version. Repeated calls to coiled.create_software_environment()
with the same inputs are a no-op; to override this you can set force_rebuild=True
.
You can’t update the software environment being used on an already running cluster. The cluster must first be closed and then restarted to use any updates made to a software environment.
List#
The coiled.list_software_environments()
function will list all available
software environments:
coiled.list_software_environments()
There is also a account=
keyword argument which lets you specify the account
which you want to list software environments for.
Delete#
The coiled.delete_software_environment()
function can be used to delete
individual software environments. For example:
coiled.delete_software_environment(name="alice/my-conda-env")
will delete the software environment named “my-conda-env” in the Coiled account named “alice”.