GPUs#

GPUs can provide massive performance boosts for workflows like training ML models, computer vision, analytics, and more. Coiled makes it straightforward to use GPU hardware on the cloud.

Run from Python

@coiled.function(
    vm_type="g5.8xlarge",       # Specify any GPU instance type
    region="us-west-2",         # Run in any region
)
def predict(filename):
    ...

Run from CLI

$ coiled run python train.py \   # Run your local scripts
    --vm-type g5.8xlarge \       # Specify any GPU instance type
    --region us-west-2           # Run in any region

Run on a cluster

cluster = coiled.Cluster(
    n_workers=50,
    worker_vm_types="g5.8xlarge",    # Specify any GPU instance type
    region="us-west-2",              # Run in any region
)

Get a Notebook

$ coiled notebook start \
    --vm-type g5.8xlarge \   # Specify any GPU instance type
    --region us-west-2 \     # Run in any region
    --sync                   # Sync your local files

This page summarizes how to use GPU hardware effectively with Coiled. In particular, we discuss:

Software
Hardware
Observability
Cost
Use cases

Software#

By default, Coiled’s automatic package synchronization handles inspecting your local Python software environment and replicating it on remote cloud VMs. This works well across both CPU and GPU environments.

In the case where a local software environment doesn’t have a GPU, but remote cloud VMs do, package sync will automatically translate between CPU and GPU versions of commonly used GPU-accelerated packages (for example, PyTorch). This enables you to drive computations on cloud GPUs from any local hardware.

So for example if you want to run PyTorch GPU code on a remote VM, just install PyTorch locally

$ pip install torch

And then run your script on a GPU-enabled VM on Coiled

$ coiled run --gpu python myscript.py

And Coiled will install the appropriate drivers and libraries to match your hardware.

If for some reason automatic package synchronization doesn’t fit your use case well, use a Docker image or manual software environment for managing software.

Hardware#

You can run on any GPU hardware available on your cloud provider, in any region. See the pricing section for a complete list of available instance types.

AWS

import coiled

@coiled.function(
    vm_type="g5.xlarge",               # NVIDIA A10 GPU instance
    region="us-west-2",
)
def process(filename):
    ...

GCP

import coiled

@coiled.function(
    vm_type="g2-standard-4",           # NVIDIA L4 GPU instance
    region="us-east1",
)
def process(filename):
    ...

Azure

import coiled

@coiled.function(
    vm_type="Standard_NV12ads_A10_v5", # NVIDIA A10 GPU instance
    region="westus2",
)
def process(filename):
    ...

Observability#

GPUs can be costly, so making the most of the available hardware is key. Among other things, Coiled automatically tracks GPU utilization and memory usage metrics to give visibility into how effectively hardware is being used. Often you can tune certain aspects of your workload to achieve higher GPU utilization (for example, ML model batch size).

*GPU utilization and memory metrics. Utilization is typically high throughout, meaning this workload is utilizing the available hardware well.*#

Cost#

It’s always good to know how much your computations will cost and try to avoid unexpected large bills. This is especially true when using GPU hardware, which tends to be more expensive than traditional CPUs.

This section covers GPU hardware costs (see pricing table), setting limits on how much you can spend, and how to avoid idle resources.

Pricing#

Coiled pricing corresponds directly to the size and duration of cloud resources used (see our pricing page). Below is a list of all supported GPU-enabled instances along with their Coiled cost.

AWS

Instance type	CPU	Memory	GPU	Price^*
g4dn.xlarge	4	16 GiB	T4	$0.45/hr
g6.xlarge	4	16 GiB	L4	$0.45/hr
g5.xlarge	4	16 GiB	A10G	$0.45/hr
g4dn.2xlarge	8	32 GiB	T4	$0.65/hr
g6.2xlarge	8	32 GiB	L4	$0.65/hr
g5.2xlarge	8	32 GiB	A10G	$0.65/hr
g4dn.4xlarge	16	64 GiB	T4	$1.05/hr
g6.4xlarge	16	64 GiB	L4	$1.05/hr
g5.4xlarge	16	64 GiB	A10G	$1.05/hr
gr6.4xlarge	16	128 GiB	L4	$1.05/hr
p3.2xlarge	8	61 GiB	V100	$1.40/hr
g6.8xlarge	32	128 GiB	L4	$1.85/hr
g4dn.8xlarge	32	128 GiB	T4	$1.85/hr
g5.8xlarge	32	128 GiB	A10G	$1.85/hr
gr6.8xlarge	32	256 GiB	L4	$1.85/hr
g4dn.12xlarge	48	192 GiB	T4 (x4)	$3.40/hr
g6.12xlarge	48	192 GiB	L4 (x4)	$3.40/hr
g5.12xlarge	48	192 GiB	A10G (x4)	$3.40/hr
g6.16xlarge	64	256 GiB	L4	$3.45/hr
g5.16xlarge	64	256 GiB	A10G	$3.45/hr
g4dn.16xlarge	64	256 GiB	T4	$3.45/hr
p3.8xlarge	32	244 GiB	V100 (x4)	$5.60/hr
g6.24xlarge	96	384 GiB	L4 (x4)	$5.80/hr
g5.24xlarge	96	384 GiB	A10G (x4)	$5.80/hr
g4dn.metal	96	384 GiB	T4 (x8)	$6.80/hr
p4d.24xlarge	96	1.12 TiB	A100 (x8)	$6.80/hr
p3.16xlarge	64	488 GiB	V100 (x8)	$11.20/hr
g6.48xlarge	192	768 GiB	L4 (x8)	$11.60/hr
g5.48xlarge	192	768 GiB	A10G (x8)	$11.60/hr
p3dn.24xlarge	96	768 GiB	V100 (x8)	$12.80/hr

GCP

Instance type	CPU	Memory	GPU	Price^*
g2-standard-4	4	16 GiB	nvidia-l4	$0.45/hr
g2-standard-8	8	32 GiB	nvidia-l4	$0.65/hr
g2-standard-12	12	48 GiB	nvidia-l4	$0.85/hr
a2-ultragpu-1g	12	170 GiB	nvidia-a100-80gb	$0.85/hr
g2-standard-16	16	64 GiB	nvidia-l4	$1.05/hr
ct5lp-hightpu-1t	24	48 GiB	ct5lp	$1.45/hr
ct5l-hightpu-1t	24	48 GiB	ct5l	$1.45/hr
a2-highgpu-1g	12	85 GiB	nvidia-tesla-a100	$1.60/hr
g2-standard-24	24	96 GiB	nvidia-l4 (x2)	$1.70/hr
a2-ultragpu-2g	24	340 GiB	nvidia-a100-80gb (x2)	$1.70/hr
g2-standard-32	32	128 GiB	nvidia-l4	$1.85/hr
a2-highgpu-2g	24	170 GiB	nvidia-tesla-a100 (x2)	$3.20/hr
g2-standard-48	48	192 GiB	nvidia-l4 (x4)	$3.40/hr
a2-ultragpu-4g	48	680 GiB	nvidia-a100-80gb (x4)	$3.40/hr
a2-highgpu-4g	48	340 GiB	nvidia-tesla-a100 (x4)	$6.40/hr
ct5lp-hightpu-4t	112	192 GiB	ct5lp (x4)	$6.60/hr
ct5l-hightpu-4t	112	192 GiB	ct5l (x4)	$6.60/hr
g2-standard-96	96	384 GiB	nvidia-l4 (x8)	$6.80/hr
a2-ultragpu-8g	96	1.33 TiB	nvidia-a100-80gb (x8)	$6.80/hr
a3-megagpu-8g	208	1.83 TiB	nvidia-h100-mega-80gb (x8)	$12.40/hr
a3-highgpu-8g	208	1.83 TiB	nvidia-h100-80gb (x8)	$12.40/hr
a2-highgpu-8g	96	680 GiB	nvidia-tesla-a100 (x8)	$12.80/hr
ct5lp-hightpu-8t	224	384 GiB	ct5lp (x8)	$13.20/hr
ct5l-hightpu-8t	224	384 GiB	ct5l (x8)	$13.20/hr
a2-megagpu-16g	96	1.33 TiB	nvidia-tesla-a100 (x16)	$20.80/hr

Azure

Instance type	CPU	Memory	GPU	Price^*
Standard_NC4as_T4_v3	4	28 GiB	T4	$0.45/hr
Standard_NC8as_T4_v3	8	56 GiB	T4	$0.65/hr
Standard_NV12ads_A10_v5	12	110 GiB	A10	$0.85/hr
Standard_NC16as_T4_v3	16	110 GiB	T4	$1.05/hr
Standard_NV18ads_A10_v5	18	220 GiB	A10	$1.15/hr
Standard_NC24ads_A100_v4	24	220 GiB	A100	$1.45/hr
Standard_NV36ads_A10_v5	36	440 GiB	A10	$2.05/hr
Standard_NV36adms_A10_v5	36	880 GiB	A10	$2.05/hr
Standard_NC48ads_A100_v4	48	440 GiB	A100	$2.65/hr
Standard_NC64as_T4_v3	64	440 GiB	T4	$3.45/hr
Standard_NV72ads_A10_v5	72	880 GiB	A10	$3.85/hr
Standard_NC96ads_A100_v4	96	880 GiB	A100	$5.05/hr
Standard_ND96asr_A100_v4	96	900 GiB	A100	$5.05/hr
Standard_ND96amsr_A100_v4	96	1.86 TiB	A100	$5.05/hr

^* = You still have to pay your cloud provider. You also get $500 worth of Coiled usage free each month.

Limits#

You can set cost controls for an entire Coiled workspace and for individual users within a workspace. This gives you fine control over how much users are able to spend each month.

See Managing resources for more details on workspace management and cost controls.

Avoid Idle Resources#

Idle timeout. Cloud resources take a couple minutes to spin up, so keeping them up is often desirable for ad hoc, interactive work. However, there’s usually a balance to strike between keeping cloud resources ready for rapid use and paying for idle hardware.

Coiled APIs have an idle_timeout parameter that can be used to control how long cloud resources can be idle before being automatically shut down.

Functions

@coiled.function(
    idle_timeout="5 minutes",    # Shutdown after 5 minutes of idleness
    ...
)
def predict(filename):
    ...

Clusters

cluster = coiled.Cluster(
    idle_timeout="10 minutes",   # Shutdown after 10 minutes of idleness
    ...
)

Notebooks

$ coiled notebook start \
    --idle-timeout "1 hour"      # Shutdown after 1 hour of idleness

Adaptive Scaling. Coiled clusters and functions support adaptively scaling, or autoscaling, cloud resources up and down, depending on the size of the workload. This helps you scale up during compute intensive portions of a workflow, and then automatically scale down after work is completed. This helps avoid idle resources sitting around.

Coiled adaptive scaling — *Cloud VMs adaptively scale up to process quickly in parallel and then scale down when work is done.*#

Use Cases and Examples#

See the following for more concrete use cases and fully worked examples with GPUs in action on Coiled: