Integrations#
It is easy to integrate Coiled with different systems including
Developer environments like cloud notebooks and IDEs
Production systems like Airflow/Prefect/Dagster/Cron/Jenkins
CI/CD systems like GitHub Actions
… and much more
This is because Coiled is API-first, primarily being driven through its Python interface
import coiled
cluster = coiled.Cluster(...)
This means that Coiled can be invoked from anywhere that runs Python.
Coiled feels more like using a library (like pandas
or numpy
) than it feels
like using a platform (like Databricks or Kubernetes).
Example: Cloud Notebooks#
It is easy to integrate Coiled to your favorite cloud notebook provider. These are the same steps you would do locally.
Install the coiled Python library
pip install coiled
Authenticate your cloud notebook environment with Coiled
coiled login
This stores an API token in your notebook’s persistent drive
Use Coiled in your notebook
import coiled cluster = coiled.Cluster(...)
Your software packages and cloud credentials will be copied from your cloud notebook environment and replicated on the remote machines.
Users familiar with Coiled will notice that this process is identical to the process to use Coiled from a personal laptop. There’s no difference to using Coiled in different locations (cloud, local, HPC) or development environments (notebooks, VS Code, workflow managers).
Example: AWS Lambda#
As an example, let’s say that you want to integrate Coiled with AWS Lambda to augment your Lambda functions with larger cloud hardware (Lambda jobs are constrained in their available hardware)
You start with your normal AWS Lambda Python script
def lambda_handler(event, context):
# List filenames to process
filenames = event.get('filenames')
# Process all filenames (This is slow and runs out of memory sometimes)
results = []
for filename in filenames:
data = load(filename)
result = process(data)
results.append(result)
# Return a response
return {
'statusCode': 200,
'body': results,
})
}
Because we have access to Python in this Lambda function we can invoke Coiled, asking for more and larger hardware
def lambda_handler(event, context):
# List filenames to process
filenames = event.get('filenames')
# Ask for larger machines
cluster = coiled.Cluster(worker_memory="512 GiB")
client = cluster.get_client()
# Process all files on these larger machines, gather results back
def compute(filename):
data = load(filename)
return process(data)
tasks = client.map(compute, filenames)
results = client.gather(tasks)
# Return a response
return {
'statusCode': 200,
'body': results,
})
}
This pattern is common. We use Coiled together with existing production systems to augment those systems with greater scale on the cloud. There are a few common questions with these systems:
Q: How do I set a Coiled API key?
A: You can set an API key with the environment variableDASK_COILED__TOKEN
Q: How do I specify a production software environment for the Coiled workers?
A: You’ve already set a production software environment in the location where you’re invoking the Coiled cluster (like the AWS Lambda environment above). Coiled will replicate that exact locked down environment on the remote machines automatically.Q: How do I ensure Coiled has the right permissions to access my data?
A: If your data is stored on cloud storage then Coiled will replicate the credentials in the hosting environment (AWS Lambda in this case) and forward those credentials to the workers.If your permissions depend on environment variables (common with databases like Snowflake for example) you can send those to the workers with the
coiled.Cluster.send_private_envs
method.
Conclusion#
Coiled integrations are easy because Coiled operates as just a Python API.
In this way Coiled feels more like a library like pandas
or numpy
rather
than a platform like Databricks or Kubernetes.
Coiled is designed to be used anywhere you can run Python.
So in most cases the answer to the question of “Can I integrate Coiled with X” is
“yes, just import coiled
in that system and go.”