GPUs

It’s easy to start a GPU-enabled Coiled cluster with:

cluster = coiled.Cluster(worker_gpu=1)

In practice, though, you often want to specify a bit more, including a software environment and some additional configuration.

Here’s an example showing how to create a PyTorch software environment, start a cluster using that environment, and check that PyTorch is installed as expected and GPUs are available. Each of these steps is covered in more detail in the following sections.

import coiled

coiled.create_software_environment(
    name="pytorch",
    conda={
        "channels": ["pytorch", "nvidia", "conda-forge", "defaults"],
        "dependencies": [
            "python=3.11",
            "dask=2023.5.1",
            "pytorch",
            "optuna",
            "torchvision",
            "cudatoolkit",
            "pynvml",
        ],
    },
    gpu_enabled=True,
)

cluster = coiled.Cluster(
    software="pytorch",
    n_workers=20,                    # 20 workers
    worker_gpu=1,                    # one GPU per worker
    worker_options={"nthreads": 1},  # one thread per GPU
)


def test_gpu():
    import torch

    if torch.cuda.is_available():
        num_gpus = torch.cuda.device_count()
        device_index = torch.cuda.current_device()
        device_name = torch.cuda.get_device_name(device_index)

        return num_gpus, device_name


client = cluster.get_client()
f = client.submit(test_gpu)
f.result()

client.shutdown()

Python environment

Most of the time we recommend using Automatic Package Synchronization to automatically replicate your local Python environment on your cluster. However, GPU-enabled clusters are usually started from machines without GPUs, such as a personal laptop. A number of libraries required for working with GPUs have GPU-specific hardware requirements.

You can instead create a Coiled software environment. You can build an environment either by specifying a list of conda packages or by providing an existing Docker container.

You can use any software environment that works for you, as long as it has the following:

  • dask, distributed are required for creating Dask clusters. Installable via conda or pip (see the Dask documentation).

  • cudatoolkit >= 11.0 is required for low-level compute optimization. Installable via conda-forge and nvidia (see the NVIDIA documentation).

  • pynvml (optional) allows GPU metrics to appear in the Dask scheduler dashboard. Installable via conda or pip (see PyPI).

Conda

Here’s an example of a GPU-enabled environment for working with PyTorch and Optuna. Setting gpu_enabled=True is required for setting the CUDA version for conda. This ensures GPU-versions of packages will be installed.

import coiled

coiled.create_software_environment(
    name="pytorch",
    conda={
        "channels": ["pytorch", "nvidia", "conda-forge", "defaults"],
        "dependencies": [
            "python=3.11",
            "dask=2023.5.1",
            "pytorch",
            "optuna",
            "torchvision",
            "cudatoolkit",
            "pynvml",
        ],
    },
    # sets CUDA version for conda
    gpu_enabled=True,
)

See PyTorch and Optuna for a worked example using this environment.

Docker container

You can also create an environment from an existing Docker container. For example, the RAPIDS image is publicly available on the container registry for NVIDIA GPU Cloud. It includes a number of open source GPU-accelerated libraries and APIs including cuDF, cuML, and xgboost (see the RAPIDS documentation).

import coiled

coiled.create_software_environment(
    name="rapids-stable",
    container="nvcr.io/nvidia/rapidsai/base:23.08-cuda11.8-py3.10",
)

See RAPIDS for an example using this environment.

Creating a GPU cluster

You can create a cluster with GPU-enabled machines by using the worker_gpu argument. For example, using the pytorch software environment we created above:

cluster = coiled.Cluster(
    software="pytorch",
    n_workers=4,
    worker_gpu=1,  # single T4 per worker
    worker_options={"nthreads": 1},  # one task per worker to avoid GPU oversaturation
)

Specifying worker_gpu=1 will default to requesting a g4dn.xlarge (NVIDIA T4) instance type if you are using AWS or n1-standard-4 if you’re using Google Cloud. You can also request specific instance types, including multiple GPUs per worker (see Allowable Instance Types).

AWS instance types

If you are using AWS, you can request a specific instance type with the worker_vm_types keyword argument. For example, you could request the p3.8xlarge instance type with 4 GPUs:

cluster = coiled.Cluster(
    software="pytorch",
    n_workers=4,
    worker_vm_types=["p3.8xlarge"],  # four NVIDIA V100s per worker
)

GCP instance types

If you are using Google Cloud, you can request specific instance types using the worker_gpu and the worker_vm_types keyword arguments. You need both arguments since Google Cloud adds GPUs to different instances (the one exception being A100, which is bundled with instance type a2-highgpu-1g). See the Google Cloud documentation on GPUs for more details. You will also need to use an instance type from the N1 machine series.

You can request a cluster with two T4 GPUs per worker:

cluster = coiled.Cluster(
    software="pytorch",
    n_workers=2,
    worker_gpu=2,  # two T4s per worker
)

Or use worker_vm_types to specifically request two A100 GPUs per worker:

cluster = coiled.Cluster(
    software="pytorch",
    n_workers=2,
    worker_vm_types=["a2-highgpu-2g"],  # two A100s per worker
)

Launching a Jupyter notebook

If you’d like to launch a Jupyter notebook, you can set jupyter=True and Coiled will launch a notebook server running on the scheduler. This can be particularly helpful when you don’t have access to GPUs on your local machine.

cluster = coiled.Cluster(
    software="pytorch",  # specify the software env you want
    jupyter=True,  # run Jupyter server on scheduler
    n_workers=4,
    worker_gpu=1,  # single T4 per worker
    worker_options={"nthreads": 1},
)

print(cluster.jupyter_link)

The link will take you directly to Jupyter running on the scheduler, where you can run any code that requires a GPU. You can also use client = distributed.Client() to get a client which submits work to your cluster.

You can also used Coiled notebooks to launch a Jupyter notebook on a GPU instance (see our blog post for an example).

Pricing

Normally Coiled credits correspond roughly to one CPU-hour per credit (see our pricing page). GPUs are more expensive. You can query information about instance types (including cost) with the following function call:

coiled.list_instance_types(backend="aws", gpus=1)

Then look for the coiled_credits_per_hour key. For example, to see how much the default g4dn.xlarge instance costs per hour:

coiled.list_instance_types(backend="aws", gpus=1)["g4dn.xlarge"][
    "coiled_credits_per_hour"
]

Use Cases

For more examples of what you can do with a GPU cluster, see: