General Python#

Dask futures make it easy to parallelize arbitrary Python code, sending arbitrary Python functions to operate on arbitrary Python objects. Dask doesn’t care what your code does. It just runs it at the right time on the right machine.

You can run these functions sequentially in a simple Python for loop:

import time, random

def inc(x):
    time.sleep(random.random())
    return x + 1

def dec(x):
    time.sleep(random.random())
    return x - 1

def add(x, y):
    time.sleep(random.random())
    return x + y

results = []
for x in range(20):
    result = inc(x)
    result = dec(result)
    results.append(result)

print(results)

You can use Dask to easily parallelize this same code on your local computer:

from dask.distributed import LocalCluster
import time, random

def inc(x):
    time.sleep(random.random())
    return x + 1

def dec(x):
    time.sleep(random.random())
    return x - 1

def add(x, y):
    time.sleep(random.random())
    return x + y

cluster = LocalCluster(processes=False)
client = cluster.get_client()

results = []
for x in range(20):
    result = client.submit(inc, x)
    result = client.submit(dec, result)
    results.append(result)

results = client.gather(results)
print(results)

You can scale out larger computations to the cloud:

import coiled # use a Coiled cluster
import time, random

def inc(x):
    time.sleep(random.random())
    return x + 1

def dec(x):
    time.sleep(random.random())
    return x - 1

def add(x, y):
    time.sleep(random.random())
    return x + y

cluster = coiled.Cluster(n_workers=20) # run on a cluster in the cloud
client = cluster.get_client()

results = []
for x in range(2000): # scale 100x
    result = client.submit(inc, x)
    result = client.submit(dec, result)
    results.append(result)

results = client.gather(results)
print(results)

More Examples#

For more in-depth examples consider the following: