Dask Deployment Options#

There are many ways to deploy Dask! They’re all great choices depending on what you need.

We think that Coiled is just a little bit more great though, and so we thought we’d give a bit of a comparison against some of the more common approaches we see today. In particular we’ll cover the following options, which we see in order of decreasing popularity:

Personal Laptop#

Personal laptops are great!

If this works for you then definitely stay here and don’t use Coiled. Computing on faraway computers is a pain.

The main reason to switch to Coiled here is if …

  • You need more computing power

  • You need to be closer to cloud data

  • You need hardware your laptop doesn’t possess, like a GPU or more memory

If those aren’t true then please keep things simple.

Single Big VM#

What it is#

The most common thing we displace as a company isn’t Databricks or a fancy Kubernetes cluster, it’s the Single Big VM. Someone in your group spins up a big EC2 instance and everyone logs into it to do their big jobs. When you’re on that machine and you want to use all of it you spin up Dask

from dask.distributed import LocalCluster

cluster = LocalCluster()
client = cluster.get_client()

And suddenly you’ve got access to 100 cores.

Why it’s great#

It’s a great feeling to have 100 cores and it works pretty similarly to what you’re used to. Yes you’ve got to ssh into the machine, but it’s familiar. You’ve got a hard drive and don’t have to deal with S3. You’ve got ipython or maybe a jupyter instance running. You can pip/conda install things and stuff just works.

Why Coiled might be even better#

The main reasons why we see people shift from the Single Big VMs to Coiled are the following:

  1. Scale: at some point the single machine just can’t deliver enough computing power. Often the limit people hit first is memory, but many things can get in the way.

    Coiled does distributed computing, and so can easily access 1000 machines of the same size, if you need them.

  2. Cost: The Single Big VM approach is surprisingly expensive, often more expensive than the distributed computing approach. People typically leave these machines on 24x7.

    A 128 core m6i.32xlarge costs $6.144 per hour, or $53k per year. If you do distributed computing right, it can be much cheaper than this. We typically see customers drop their cloud computing cost by an order of magnitude when shifting away from these always-on Single Big VMs.

  3. Orchestration: As a team’s activity on this box grows they start to collide with each other

    “Hey, are you still using the big box? I have an urgent project I want to run”

    As costs grow and people start to turn them off at night/weekends, usually one person becomes the caretaker and their job starts to veer a bit more towards cloud ops than they might appreciate.

Coiled combines a lot of the convenience of the Single Big VM with the scale and cost efficiency of the cloud

  • Software: Coiled automatically sends the software of your laptop, so you don’t have to deal with Docker or installing packages remotely

  • Cost: Coiled automatically turns off machines if you’re not using them, making things super cost efficient

  • Coordination: Coiled lets everyone create their own machines whenever they want, avoiding coordination challenges

  • Maintenance: Coiled is really easy to maintain, releasing the individual who took responsibility over cloud infrastructure to go back to their job

Dask-Kubernetes#

What it is#

The second-most common Dask deployment approach we see is either dask-kubernetes or dask-gateway. The first assumes users have Kubernetes credentials and is a bit more Kubernetes native while the second doesn’t assume users have much credentials and offers a bit more of a split between users and IT/ops.

The challenges and benefits are similar for both, so we’ve lumped them together.

Why it’s great#

Kubernetes is incredibly powerful. It’s like Aladdin’s lamp in that it can accomplish magical wonders if given the right incantation of yaml.

Kubernetes is commonly already present in larger organizations. Some central IT department has provisioned a large Kubernetes cluster and provisions namespace to different groups who want to run on central hardware. Kubernetes provides a convenient abstraction for Central IT to deliver hardware in an appropriately controlled and monitored environment. They can also share that hardware smoothly between many different groups and many different concurrent applications.

Kubernetes can also run either in the cloud, or on-prem. The best technology to run Dask on-prem today is probably Kubernetes-based.

Why Coiled might be even better#

If you’re on-prem, or at a big organization where Kubernetes is mandated by law then you should probably use Kubernetes. If neither of those two things are true then Kubernetes is sometimes a massive pain in the ass. (my personal experience anyway).

Kubernetes is great when you want to run lots of micro-services concurrently with various interdependencies between them and keep load steady over weeks. That’s actually not what Dask users want to do. Dask users often want to get 1000 machines that all run the same dask worker program for five minutes, and then they want those machines to go away. Kubernetes wasn’t built for bursty/ephemeral computing like this.

Typically people who shift from Kubernetes to Coiled do so for the following reasons:

  • Maintenance: there’s usually a software engineer who’s spending half their time managing the Kubernetes and wants their life back.

  • Hardware Flexibility: Kubernetes are kinda constraining. These clusters typically restrict their Dask users to pre-defined profiles like Small/Medium/Large nodes. When users ask for something outside of this initial modeling, like “Can I have a GPU?” then that typically requires non-trivial effort from the software engineer in charge of the Kubernetes cluster.

    Similarly when users want to use different regions (for example because their data is in a different region) or different clouds these all require the company to set up another Kubernetes deployment entirely.

  • Costs: Most Kubernetes clusters we encounter aren’t set up well to take advantage of cost-saving measures like Spot (especially not as well as we do) or ARM.

But mostly, Kubernetes is just a PITA to maintain.

Dask-Cloudprovider#

Dask-cloudprovider avoids the pain of Kubernetes and deploys Dask directly on raw cloud APIs. In many ways Coiled’s Raw Cloud Architecture was inspired by the success we saw with the Dask-Cloudprovider approach.

Why it’s great#

Dask-cloudprovider doesn’t require any infrastructure to set up, and it gives you full scalability of the cloud. All you need is permissions to a cloud account and a Docker image and you’re good to go. It’s super light-weight.

Why Coiled might be even better#

Typically people who shift from Dask-cloudprovider to Coiled do so for the following reasons:

  • Docker images: often their users don’t know how to use Docker well, or if they do it’s challenging to keep software environments in sync.

    Coiled’s Package Sync feature ends up being critical here.

  • Fear of leaving things on: because Dask-cloudprovider doesn’t have an always-on service monitoring jobs it’s unfortunately quite possible for resources like VMs, disk volumes, or network resources to not be cleaned up for months at a time.

    Because Coiled has a centralized platform watching everything that’s going on users can rest assured that they’re not racking up cloud bill while they sleep.

  • Security: unfortunately Dask-cloudprovider can’t easily leverage SSL/TLS for communication, and so most data gets transmitted in the clear.

We don’t actually see Dask-cloudprovider as much any more. Our sense is that the Coiled free tier (we manage up to 10,000 CPU-hours per month every month for free) ends up serving this need for people who just want to get started and not deal with either technical systems like Kubernetes or companies.

Dask-Gateway#

What it is#

Dask-Gateway is an always-on service that tracks and manages Dask clusters for users that have access to connect to it. It typically lives in a privileged IT environment, and serves Dask users in a client-server configuration. Dask Gateway is intended to create a stronger separation between IT concerns (like cloud/kubernetes credentials) and user concerns (like “is my cluster up”).

Why it’s great#

Having a dedicated Dask service is good for both IT and user groups in a few ways

  • Less Expertise Required for users: Many Python data science/engineers don’t have expertise with Kuberentes or the cloud. Creating some separation there is good.

  • More control for IT: IT people can plug Dask Gateway into user databases so they can theoretically more tightly control what people can and can’t do

  • Tracking: Having an always-on system that tracks Dask clusters makes them a bit more reliable.

Why Coiled might be better#

Coiled like Dask-Gateway, but just far more mature. Coiled is more robust than Dask Gateway, easier to set up and manage, easier to connect to enterprise systems like SSO, provides more metrics and observability, and is professionally managed.

Why Coiled might not be best#

That doesn’t mean that Coiled is always best. There are a few good reasons why we recommend other solutions sometimes.

  • Money: For heavy usage, Coiled costs money. While we typically find that the cost savings Coiled provides (both cloud and personnel costs) typically vastly outweigh Coiled’s price, some people are averse to paying companies money, which makes total sense.

    In these cases, we at least hope that the generous free tier helps folks who have only moderate needs.

  • On-prem: For organizations whose data is primarily on-prem, or for whom using the Cloud poses an unacceptable security risk certainly Coiled isn’t the right choice.

  • Smaller use cases: Often you don’t need to use the cloud or distributed computing. If you don’t then please avoid the technology. It’s like commuting to work in a tank when a bicycle will do.

There are no bad options#

The right choice to make when determining deployment technology for distributed systems depends on many factors like …

  • Your underlying hardware (cloud or on-prem)

  • Your compute needs (just a little, or a heck of a lot)

  • Your security needs (need professional security, or is your data mostly open?)

  • Your IT staff and their interest in maintaining systems

  • Your colleagues and their familiarity with using systems like EC2 or Kubernetes

No solution is optimal in all situations. Fortunately there are enough good solutions that you should be well covered in any situation. If you still have questions after reading through this we’re happy to chat through options, even if you think we aren’t a good fit (but we’ll probably try to convince you that we are 😉)