Advanced Usage#
Local packages and Git dependencies#
Package sync works with:
Locally installed packages, like those installed via
pip install -e <some-directory>
Python files in your working directory. Any
.py
files in your working directory will be synced to your remote environment.Packages installed from Git, like those installed via
pip install git+ssh://git@github.com/dask/distributed.git@d74f5006
.
Note
When installing packages from Git, you may need to add the --use-pep517
flag to pip, like:
pip install git+https://github.com/dask/distributed.git --use-pep517
Without this flag, pip may not record sufficient metadata to tell that the package was installed from Git.
Warning
By default, your compiled local packages are uploaded to a private, Coiled-owned S3 bucket, then downloaded by your cluster. If having a copy of your source code in a Coiled S3 bucket violates your organization’s security policies, you can instead use your own S3 bucket. Reach out to us at support@coiled.io if you’d like to use this option.
Private PyPI URLs#
If you have a private (or other custom) PyPI index_url
or extra-index-url
, you will need to have it configured in one of the following ways:
Environment variables:
PIP_INDEX_URL
,PIP_PYPI_URL
, orUV_INDEX_URL
for the primary index URL;PIP_EXTRA_INDEX_URL
orUV_EXTRA_INDEX_URL
for extra index URL(s).A
pyproject.toml
file in your working directory: in one of the[[tool.poetry.source]]
,[tool.pixi.pypi-options]
, or[tool.uv.pip]
sections.A
uv.toml
file in your working directory: in the[pip]
section.A
pixi.toml
file in your working directory: in the[pypi-options]
section.Running
pip config set 'global.extra-index-url' <URL>
orpip config set 'global.index-url' <URL>
.
If you usually pass the --extra-index-url
or --index-url
argument when you run pip install
, no record of where the package came from is stored in the environment,
so package sync will fail to install it on the cluster.
If you do not want to include the username and password in the URL locally, you can include it in a netrc file,
or use keyring credential storage. If you use keyring
, we require the keyring
package to be installed in your Python environment.
GPUs#
If you don’t have a GPU locally, but would like to use GPUs on your remote cloud VMs, package sync will automatically translate between CPU and GPU versions of commonly used GPU-accelerated packages (like PyTorch, for example). This enables you to drive computations on cloud GPUs from any local hardware. See GPU Software for more details.
Extra conda packages#
If you have conda packages that you would like installed on your cluster that you do not have installed locally (e.g., system packages that are required by your dependencies on Linux), you can list them in the package_sync_conda_extras
argument to coiled.Cluster
.
This should be used with caution, as it can potentially introduce dependency conflicts, because the dependencies for these packages will also be installed via conda.
Warning
This will not work for “noarch” conda packages, and should only be used for installing packages with platform-specific builds.
Ignoring packages#
If you have packages installed locally that you don’t want synced to the cluster, you can list them in the package_sync_ignore
argument to coiled.Cluster
. This is generally not needed, though, because package sync installation on the cluster is so fast that installing extra, unused packages has a negligible effect on cluster startup time.
Note that only these exact packages are ignored—their dependencies may still be installed. Additionally, if another package depends on them, they will still be installed.
Cross-platform fuzzing#
When using a macOS or Windows machine to launch clusters (which always run Linux), you may not get exactly the same versions of all packages on the cluster as you have locally. This is because packages sometimes require slightly different dependencies on different platforms, so package sync uses a looser version match with cross-platform clusters. If you have trouble using package sync for cross-platform clusters, we recommend creating a new environment only and installing the packages you need to run.
Mandatory packages#
Package sync will refuse to start a cluster if you don’t have these basic packages for running Dask installed locally:
dask
distributed
tornado
cloudpickle
msgpack
Additionally, package sync ensures that the versions of these packages match exactly with what you have locally, even cross-platform.