GPUs#
Quickstart#
Coiled supports running computations with GPU-enabled machines. You can first create a Coiled software environment using the RAPIDS image, publicly available on the container registry for NVIDIA GPU Cloud:
import coiled
coiled.create_software_environment(
name="rapids-stable",
container="nvcr.io/nvidia/rapidsai/rapidsai-core:23.04-cuda11.8-runtime-ubuntu22.04-py3.10",
)
Then you can create a cluster with GPU-enabled machines by using the worker_gpu
argument:
cluster = coiled.Cluster(
software="rapids-stable",
n_workers=4,
scheduler_gpu=True, # recommended
worker_gpu=1, # single T4 per worker
worker_class="dask_cuda.CUDAWorker", # recommended
environ={"DISABLE_JUPYTER": "true"}, # needed for "stable" RAPIDS image
)
You’ll notice a few arguments we recommend, but are not required:
scheduler_gpu=True
supports Dask’s (de)serializationworker_class='dask_cuda.CUDAWorker'
can be helpful if you’re using multiple GPUs per workerenviron={'DISABLE_JUPYTER': 'true'}
is only needed if you are using the stable RAPIDS image and disables the default Jupyter server from starting
Verification#
You can then verify this cluster is working as expected:
from dask.distributed import Client
def test_gpu():
import numpy as np
import cupy as cp
x = cp.arange(6).reshape(2, 3).astype("f")
result = x.sum()
return cp.asnumpy(result), str(result.device)
client = Client(cluster)
f = client.submit(test_gpu)
f.result()
This should return (array(15., dtype=float32), '<CUDA Device 0>')
.
You can also verify workers are using GPUs with cluster.scheduler_info["workers"]
.
Requesting Instance Types#
When creating a cluster, specifying worker_gpu=1
will default to requesting a g4dn.xlarge
(NVIDIA T4) instance type if you are using AWS or n1-standard-4
if you’re using Google Cloud. You can also request specific instance types, including multiple GPUs per worker.
AWS#
If you are using AWS, you can request a specific instance type with the worker_vm_types
keyword argument. For example, you could request the p3.8xlarge
instance type with 4 GPUs:
cluster = coiled.Cluster(
software="rapids-stable",
n_workers=4,
worker_vm_types=["p3.8xlarge"], # four NVIDIA V100s per worker
worker_class="dask_cuda.CUDAWorker",
environ={"DISABLE_JUPYTER": "true"}, # needed for "stable" RAPIDS image
)
Google Cloud#
If you are using Google Cloud, you can request specific instance types using the
worker_gpu
and the worker_vm_types
keyword arguments. You need both arguments since Google Cloud
adds GPUs to different instances (the one exception being
A100, which is bundled with instance type a2-highgpu-1g).
See the Google Cloud documentation on GPUs
for more details. You will also need to use an instance type from the
N1 machine series.
You can request a cluster with two T4 GPUs per worker:
cluster = coiled.Cluster(
software="rapids-stable",
n_workers=2,
worker_gpu=2, # two T4s per worker
worker_class="dask_cuda.CUDAWorker",
environ={"DISABLE_JUPYTER": "true"}, # needed for "stable" RAPIDS image
)
Or use worker_vm_types
to specifically request two A100 GPUs per worker:
cluster = coiled.Cluster(
software="rapids-stable",
n_workers=2,
worker_vm_types=["a2-highgpu-2g"], # two A100s per worker
worker_class="dask_cuda.CUDAWorker",
environ={"DISABLE_JUPYTER": "true"}, # needed for "stable" RAPIDS image
)
Software Environments#
We recommend using the publicly available RAPIDS image if it has the packages you need. It includes a number of open source GPU-accelerated libraries and APIs including cuDF, cuML, and xgboost (see the RAPIDS documentation).
If the RAPIDS image does not have what you need (PyTorch and TensorFlow, e.g. are not included) you can use any software environment that works for you. In this case you’ll need to make sure you have the following:
Package |
Description |
Installation |
---|---|---|
|
Required, for creating Dask clusters |
Conda or pip installable (see the Dask documentation on installation) |
|
Required, for low-level compute optimization |
Not available on PyPI, installable with conda via a number of channels including conda-forge and nvidia (see the NVIDIA documentation on installation) |
|
Required, only if you are using the Dask CUDA worker class (e.g. |
Conda or pip installable (see the RAPIDS Dask-CUDA documentation on installation) |
|
Optional, allows GPU metrics to appear in Dask scheduler dashboard |
Conda or pip installable (see the description on PyPI) |
Example GPU Conda environment#
Let’s suppose you want to use Optuna, which isn’t included in the pre-built RAPIDS docker image.
Let’s also suppose that you don’t have a GPU on your local machine, which means you can’t install packages which require a GPU, so Package Synchronization isn’t a good fit.
You can build a Coiled environment by specifying the conda packages to install, like this:
coiled.create_software_environment(
name="rapids-optuna",
gpu_enabled=True, # sets CUDA version for Conda to ensure GPU version of packages get installed
conda={
"channels": ["nvidia", "rapidsai", "conda-forge", "defaults"],
"dependencies": [
"rapids=23.02",
"python=3.10",
"cudatoolkit=11.8",
"jupyterlab",
"dask-labextension>=6.1.0",
"optuna",
],
},
)
You can then start a cluster using this software environment. If you’d like to launch a Jupyter notebook, you can set jupyter=True
and Coiled will launch a notebook server running on the scheduler:
cluster = coiled.Cluster(
software="rapids-optuna", # specify the software env you just created
jupyter=True, # run Jupyter server on scheduler
scheduler_gpu=True, # add GPU to scheduler
n_workers=4,
worker_gpu=1, # single T4 per worker
worker_class="dask_cuda.CUDAWorker", # recommended
)
print(cluster.jupyter_link)
The link will take you directly to Jupyter running on the scheduler, where you can run any code that requires a GPU. You can also use client = distributed.Client()
to get a client which submits work to your cluster.
Next Steps#
For more examples of what you can do with a GPU cluster, see the RAPIDS Cloud ML Examples.
Pricing#
Coiled has a generous free tier for getting started or modest workloads, but as you scale up you may wish to know:
To make our pricing especially simple for typical users of CPU-only instance types, 1 Coiled Credit corresponds to 1 CPU-Hour. We also charge for running GPUs. You can query our API for the exact number of Coiled Credits per Hour for each GPU instance type you care about by looking at the coiled_credits_per_hour
key in these dictionaries:
import coiled
cloud_provider = "aws" # or "gcp"
# match instances with 1 gpu, or (e.g.) `gpus=[1, 8]` to see instances with 1–8 gpus
print(coiled.list_instance_types(backend=cloud_provider, gpus=1))
For the cost of a Coiled Credit and information about bulk discounts, see Pricing Page.