GCP Backend

Using Coiled with GCP

You can configure Coiled to launch Dask clusters and run computations on Google Cloud Platform (GCP), either within Coiled’s GCP account or within your own GCP account.

Note

GCP support is currently experimental with new features under active development.

Tip

In addition to the usual cluster logs, our current GCP backend support also includes system-level logs. This provides rich insight into any potential issues while GCP support is still experimental.

Using Coiled’s GCP Account

You can configure Coiled to launch Dask clusters and run computations within Coiled’s Google Cloud account. This makes it easy for you to get started quickly, without needing to set up any additional infrastructure outside of Coiled.

../_images/backend-coiled-gcp-vm.png

To use Coiled on GCP, log in to your Coiled account and access your dashboard. Click on Account on the left navigation bar, then click the Edit button to configure your Cloud Backend Options:

../_images/cloud-backend-options.png

On the Select Your Cloud Provider step, select the GCP option, then click the Next button:

../_images/cloud-backend-provider-gcp.png

On the Configure GCP step, select the GCP region that you want to use by default (i.e., when a region is not specified in the Coiled Python client). Continue by selecting Launch in Coiled's GCP Account and clicking the Next button. Finally, select the registry you wish to use, then click the Submit button.

Coiled is now configured to use GCP!

From now on, when you create Coiled clusters, they will be provisioned in Coiled’s GCP account.

Using your own GCP Account

Alternatively, you can configure Coiled to create Dask clusters and run computations entirely within your own GCP account (within a project of your choosing). This allows you to make use of security/data access controls, compliance standards, and promotional credits that you already have in place within your GCP account.

../_images/backend-external-gcp-vm.png

Note that when running Coiled on your GCP account, Coiled Cloud is only responsible for provisioning cloud resources for Dask clusters that you create. Once a Dask cluster is created, all computations, data transfer, and Dask client-to-scheduler communication occurs entirely within your GCP account.

Note

The ability to configure Coiled to run in your own GCP account is currently only available to early-adopter users. Contact Coiled Support to request access.

Step 1: Obtain GCP service account credentials

Coiled provisions resources on your GCP account through the use of a service account that is associated with a custom IAM role (which will be created in the next step).

In this step, you can use the GCP Console to create a new service account (or select an existing service account) that will be used with Coiled.

Once you have created or identified a GCP service account for working with Coiled, you’ll need to create a new (or use an existing) JSON service account key. Follow the steps in the GCP documentation to create and manage a service account key.

After you create a JSON service account key, the key will be saved to your local machine with a file name such as gcp-project-name-d9e9114d534e.json with contents similar to:

{
  "type": "service_account",
  "project_id": "project-id",
  "private_key_id": "25a2715d43525970fe7c05529f03e44a9e6488b3",
  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhki...asSSS5J4526eqmrkb1OA=\n-----END PRIVATE KEY-----\n",
  "client_email": "service-account-name@project-name.iam.gserviceaccount.com",
  "client_id": "102238688522576776582",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-name%40project-name.iam.gserviceaccount.com"
}

Keep your JSON service account key handy since you’ll use it in Coiled Cloud in a later step.

Step 2: Create a custom IAM role

Coiled requires a limited set of IAM permissions to be able to provision infrastructure and compute resources in your GCP account. You’ll need to create a new IAM role and assign the appropriate set of permissions to it.

In this step, you’ll create a new IAM role by following the steps in the GCP documentation on creating a custom role. Specify an IAM role name such as coiled that will make it easy to locate in the next step.

Rather than manually adding each permission in the GCP console interface, we recommend that you use the gcloud command-line tool to create a custom role using a YAML file, which is described in the dropdown below.

Step 3: Connect the service account to the role

Once you’ve created a service account and a custom IAM role to use with Coiled, you can bind the service account to the custom role via the GCP Cloud Console or using the gcloud command-line tool, in a terminal, as in:

gcloud projects add-iam-policy-binding <PROJECT-ID> \
    --member=serviceAccount:<CLIENT-EMAIL> \
    --role=projects/<PROJECT-ID>/roles/coiled

Step 4: Configure Google Artifact Registry

If you want to store the Docker containers for your software environments in your own GCP account, Coiled stores them in the Google Artifact Registry (GAR). If you want to store your software environments in Docker Hub or another external Docker registry, you can skip this step and configure the registry settings in the next step.

In this step, you’ll enable the Google Artifact Registry API, create a GAR repository for Coiled, and create an IAM policy binding that grants limited access to the service account for Coiled. Using this configuration, Coiled will not have access to any other repositories in your GCP account, and Coiled does not require admin-level permissions to enable APIs or create repositories.

To enable the Google Artifact Registry API, run the following gcloud command in a terminal:

gcloud services enable --project=<PROJECT_ID> artifactregistry.googleapis.com

Create a GAR repository for Coiled to use by running the following command in a terminal. Note that the repository must be named coiled exactly as shown, and that the location should be one that we currently support: us-east1 or us-central1. If you’d like to use a different region, please get in touch with Coiled Support.

gcloud artifacts repositories create coiled \
  --project=<PROJECT_ID> \
  --repository-format=docker \
  --location=<REGION>

Finally, grant access to the repository we just created:

gcloud artifacts repositories add-iam-policy-binding coiled \
   --project=<PROJECT_ID> \
   --location=<REGION> \
   --member=serviceAccount:<CLIENT-EMAIL> \
   --role=roles/artifactregistry.repoAdmin

Note

Ensure that the region specified in the location option is the same region that you will use when configuring your Coiled Cloud backend in the next step. If you want to store software environments in multiple regions, then you can repeat these commands with the desired REGION.

Note

We’ve noted that it can take a few minutes for the policy binding to propagate (anecdotally, about 2 to 5 minutes). Keep this in mind if you quickly complete the next step and get an error related to Google Artifact Registry.

Step 5: Configure Coiled Cloud backend

Now you’re ready to configure the cloud backend in your Coiled Cloud account to use your GCP account and GCP service account credentials.

To configure Coiled to use your GCP account, log in to your Coiled account and access your dashboard. Click on Account on the left navigation bar, then click the Edit button to configure your Cloud Backend Options:

../_images/cloud-backend-options.png

Note

You can configure a different cloud backend for each Coiled account (i.e., your personal/default account or your Team account). Be sure that you’re configuring the correct account by switching accounts at the top of the left navigation bar in your Coiled dashboard if needed.

On the Select Your Cloud Provider step, select the GCP option, then click the Next button:

../_images/cloud-backend-provider-gcp.png

On the Configure GCP step, select the GCP region that you want to use by default (i.e., when a region is not specified in the Coiled Python client). Then choose the Launch in my GCP account option, add your JSON service account key file, then click the Next button.

../_images/cloud-backend-credentials-gcp.png

On the Container Registry step, select where you want to store Coiled software environments, then click the Next button:

../_images/cloud-backend-registry-gcp.png

Review the cloud backend provider options that you’ve configured, then click on the Submit button:

../_images/cloud-backend-review-gcp.png

Coiled is now configured to use your GCP Account!

From now on, when you create Coiled clusters, they will be provisioned in your GCP account.

Step 6: Create a Coiled cluster

Now that you’ve configured Coiled to use your GCP account, you can create a cluster to verify that everything works as expected.

To create a Coiled cluster, follow the steps listed in the quick start on your Coiled dashboard, or follow the steps listed in the Getting Started documentation, both of which will walk you through installing the Coiled Python client and logging in, then running a command such as:

import coiled

cluster = coiled.Cluster(n_workers=1)

from dask.distributed import Client

client = Client(cluster)
print("Dashboard:", client.dashboard_link)

Note

If you’re using a Team account in Coiled, be sure to specify the account= option when creating a cluster, as in:

cluster = coiled.Cluster(n_workers=1, account="my-team-account-name")

Otherwise, the cluster will be created in your personal/default account in Coiled, which you can access by switching accounts at the top of the left navigation bar in your Coiled dashboard.

Once your Coiled cluster is up and running, you can run a sample calculation on your cluster to verify that it’s functioning as expected, such as:

df = dd.read_csv(
    "s3://nyc-tlc/trip data/yellow_tripdata_2019-*.csv",
    dtype={
        "payment_type": "UInt8",
        "VendorID": "UInt8",
        "passenger_count": "UInt8",
        "RatecodeID": "UInt8",
    },
    storage_options={"anon": True},
    blocksize="16 MiB",
).persist()

df.groupby("passenger_count").tip_amount.mean().compute()

At this point, Coiled will have created resources within your GCP account that are used to power your Dask clusters.

Backend options

There are several GCP-specific options that you can specify (listed below) to customize Coiled’s behavior. Additionally, the next section contains an example of how to configure these options in practice.

Name

Description

Default

region

GCP region to create resources in

us-east1

zone

GCP zone to create resources in

us-east1-c

spot

Whether or not to use preemptible instances for cluster workers

True

The GCP backend for Coiled uses preemptible instances for the workers by default. Note that GCP might stop preemptible instances at any time and always stops preemptible instances after they run for 24 hours.

Example

You can specify backend options directly in Python:

import coiled

cluster = coiled.Cluster(backend_options={"region": "us-central1", "spot": False})

Or save them to your Coiled configuration file:

# ~/.config/dask/coiled.yaml

coiled:
  backend-options:
    region: us-central1

to have them used as the default value for the backend_options= keyword:

import coiled

cluster = coiled.Cluster()

GPU support

This backend allows you to run computations with GPU-enabled machines if your account has access to GPUs. See the GPU best practices documentation for more information on using GPUs with this backend.

Workers currently have access to a single GPU, if you try to create a cluster with more than one GPU, the cluster will not start, and an error will be returned to you.

Coiled logs

If you are running Coiled on your GCP account, the logs from Clusters and Jobs will be saved within your GCP account. Coiled will send logs to both GCP Logging and GCP Cloud Storage.

We send logs to GCP Logging so that you can easily view logs with GCP Logs Explorer, and we use GCP Cloud Storage to back the logs views we display on the Cluster Dashboard.

Log Storage

Storage time

GCP Logging

30 days

GCP Cloud Storage

90 days

When you configure your backend to use GCP, Coiled creates a bucket named coiled-logs in both GCP Logging and GCP Cloud Storage. In the GCP Cloud Storage bucket we create a coiled.log file for each instance created, organized into folders following the structure <account>/<instance>/coiled.logs.