Distributed printing#

Dask makes it easy to print whether you’re running code locally on your laptop, or remotely on a cluster in the cloud.

One of the most basic things programmers do is print text to their screen.

Printing is often used for things like debugging:

# ...
print("Made it here...")
# ...

or to signal progress:

for i in range(10):
    print(f"Done with iteration {i}")

However, when running code at scale on a Dask cluster even simple print calls can become non-intuitive.

For example, this snippet runs two functions on a Dask/Coiled cluster in the cloud:

import coiled
import dask

# Spin up cluster
cluster = coiled.Cluster()
client = cluster.get_client()

@dask.delayed
def inc(x):
    print(f"Calling inc on {x}")
    return x + 1

@dask.delayed
def add(x, y):
    print(f"Calling add on {x} + {y}")
    return x + y

# Run computations on the cluster
x = inc(1)
y = add(x, 7)
y.compute()

Both the inc and add functions have print calls and, naively, I’d expect to see

Calling inc on 1
Calling add on 2 + 7

printed to my screen when I run this example. However, when you run this code snippet you don’t see any output. That’s because those prints happen on the remote Dask workers (in this case on AWS) where these tasks are run. We can see the result of these prints by looking our cluster logs.

print-in-worker-logs

It’s good that these prints work, but it’s not a very pleasant experience and probably doesn’t match most user’s expectations.

To make printing on a cluster similar to on programmer’s local machine, Dask has a dask.distributed.print function which mirrors Python’s built-in print function but, when called in a task running on a cluster, the output of the print will be forwarded back to the user’s client Python session and printed there.

Here’s the same example as above, but with one line added to use Dask’s print:

import coiled
import dask
from dask.distributed import print  # This line is the only difference

# Spin up cluster
cluster = coiled.Cluster()
client = cluster.get_client()

@dask.delayed
def inc(x):
    print(f"Calling inc on {x}")
    return x + 1

@dask.delayed
def add(x, y):
    print(f"Calling add on {x} + {y}")
    return x + y

# Run computations on the cluster
x = inc(1)
y = add(x, 7)
y.compute()

Running this on my laptop prints

Calling inc on 1
Calling add on 2 + 7

in my IPython session 🎉

When using prints inside tasks running on a Dask cluster, know that dask.distributed.print is available and probably what you want to use.

Tip

Dask also has a dask.distributed.warn utility, which mirrors Python’s built-in warnings.warn that forwards warnings to user’s client Python session.