Easy Heavyweight Serverless Functions#
tl;dr#
What is the easiest way to run Python code in the cloud, especially for compute jobs?
We briefly compare common options (Lambda, EC2, Fargate, Modal) and then pitch a new contender: Coiled Run.
Motivation to Run Python Code in the Cloud#
We want to run jobs in the cloud for a few common reasons:
Data proximity: Our data lives in the cloud, and we want to process it where it lives to avoid costly egress charges
Speed and Scale: We have many things to process and don’t want to wait
Hardware access: We want GPUs or big-memory machines
Always-on Coordination: We want to respond to actions in the real world, like in a web server or in response to data arriving
This post focuses on the first three use cases which require more heavyweight computing. As an example, maybe we want to load, manipulate, and store many files living in cloud storage:
def process(filename): # Expensive function
df = pd.read_csv(filename)
df = df[df.value > 0]
df.to_parquet(filename[:-4] + ".parquet")
filenames = s3.glob("s3://my-bucket/*.csv") # Lots of files
for filename in filenames: # Run many times
process(filename)
Let’s see what kinds of options we have.
Cloud Infrastructure to Run Single Jobs#
In general we have two classes of options:
Big VMs (EC2, ECS) which give you a big VM
Pros:
Hardware flexibility: you can use any hardware available in the cloud
Cheap: typically $0.05 per CPU-hour
Cons:
Annoying to set up / inaccessible
Takes at least a minute to turn on (but more likely an hour of human time)
Serverless technologies (AWS Lambda) which run a function for you, hiding the VM
Pros:
Much easier to use
Turns on quickly (seconds)
Cons:
Easier, but not yet easy. These are still kinda annoying to set up.
Hardware limitations
Expensive: typically $0.20 per CPU-hour, can’t use Spot
Most people I see using cloud for ad-hoc computing choose VMs. They spin up a big instance, use it for a while, and then spin it down (hopefully). People who start with serverless tend to move over to VMs, usually due to some flexibility limitation. AWS Lambda is often great, except if any of the following are true:
You need a big machine
You want accelerated hardware like GPUs
You have a large software environment
Your functions last tens of minutes
Your functions last more than a minute, and you’re cost conscious
Third party orchestrators#
Products like Argo, Prefect, Airflow, Coiled+Dask (us!), Kubeflow, Anyscale+Ray, Dagster and more will also happily run jobs for you, but this is as part of a broader abstraction. If you just want to run a single job, they’re all overkill.
The best product experience I’ve seen in this space is Modal Labs. It’s pretty slick. Like the others though, it still has non-trivial ceremony. It also has some other limitations like you have to run in their cloud accounts (so data privacy is a concern), it’s only on one region (so data egress costs can be non-trivial) and they cost $0.20 per CPU-hour (which is a 4x markup). Generally a great product experience though, and we hear only positive things. I encourage people to check them out.
Coiled Run#
So now we get to our new solution. There are two APIs, they look like this:
CLI:
$ coiled run echo "Hello, world"
Python:
import coiled @coiled.run(memory="512 GiB", region="us-west-2") def f(): return "Hello, world!" f()
Coiled was originally built to deploy distributed Dask clusters in the cloud, a harder problem than serverless execution. However, the infrastructure we built for Dask transfers over surprisingly well. We started this effort because we saw Coiled users spinning up single-node Dask clusters (rather than hundred-node clusters) to run a single task (rather than millions). We took the hint and built these new APIs on top of the existing infrastructure. Here you go:
Run a script#
The easiest thing is to run a standalone script in the cloud. Here we run a script in Frankfurt on a machine with 128 cores.
$ coiled run python myscript.py --region eu-central-1 –vm-type m6i.32xlarge
Optionally specify useful keywords:
$ coiled run python myscript.py \
--region us-east-1 \
–vm-type m6i.32xlarge \
$ coiled run python pytorch-train.py \
--region us-west-1 \
–-vm-type g5.2xlarge \ # GPU instance type
--container ...
Doesn’t even need to be a Python script
$ coiled run echo "Hello, world"
Coiled creates a VM, synchronizes all of your local software packages, cloud credentials, files, etc.., runs your script, and then shuts down the VM.
Here is a more fully worked example processing parquet data on S3
Run a single function#
Alternatively you can run your Python code locally, but decorate individual functions to run remotely, as with my_function
below.
import coiled
@coiled.run(
memory="256 GiB",
region="us-east-1",
)
def my_function(...):
return ...
result = my_function(*args, **kwargs)
Here are some more fully worked examples.
Pros and Cons of Coiled Run#
Coiled Run: Accidental, but easy and efficient cloud functions
Pros:
Easiest API we can find
Any hardware / architecture (uses EC2 instances)
Runs securely in your account
Runs in any region, close to your data
Base cloud cost (uses EC2 instances)
Cons:
Minute-long start times for your first function call (but subsequent calls have only millisecond delays)
Third-party service (requires separate sign-up/accounting)
This essentially offers the flexibility and cost efficiency of EC2 instances, but without the setup pain.
We think that the UX is smooth, mostly inheriting from other features we built for Dask clusters like the following:
Automatic package synchronization: we transmit the same package versions you have locally to your cloud workers
Credential forwarding: we securely forward your AWS permissions to remote workers (safely using AWS STS)
Cost/user management controls: we help you constrain costs on a per-user basis
General progress / UX polish: things just look nice.
Opportunity#
We’re excited about Coiled functions because they are …
Easy to use for Python developers unfamiliar with cloud computing
Flexible with their ability to use any hardware in any region
Cheap being backed by foundational technologies like EC2 Fleets.
Dask is super-powerful (far more powerful than what’s described in this post) but there are lots of folks for whom Dask is overkill. We hope that these simpler interfaces are a good onramp to data-proximate and parallel cloud computing.
Interested in playing? The following should work if you have cloud credentials:
pip install coiled
coiled setup # this will connect Coiled to your cloud account
coiled run echo "Hello, world"
For more on how to get started, see