Analytics#

Measurement is the foundation of performance.

Coiled Analytics lets you track Dask usage wherever Dask is run.

Motivation#

When running computations we often ask ourselves questions like the following:

  • Did my computation finish?

  • Did any exceptions occur?

  • How much did that cost me?

  • What is taking most of the time?

  • Why is that cluster still running?

Experienced users know that Dask presents answers to these questions visually through the Dask dashboard. However, the Dask dashboard only tracks the real-time performance of a single Dask cluster. Coiled extends Dask by tracking many Dask clusters across many users and storing those results over time for later analysis. Coiled analytics provides a team-wide view of all clusters over time.

Getting Started#

Coiled Infrastructure#

If you are launching clusters though Coiled then this is already set up for you.

Your own infrastructure#

You can use Coiled analytics on clusters that you manage yourself outside of the Coiled platform. See Install

What information does Coiled Track?#

Coiled tracks aggregate information about cluster activity including the following:

  • Basic level statistics

    • Number of active workers and worker threads

    • Amount of used and total memory

    • Software versions of common libraries

  • Performance statistics

    • Task information, including names, numbers, and compute and transfer durations

    • Profiling, including which functions and lines of code take the most time

    • Code snippets surrounding the Dask calls

    • How long has it been since any work was completed

  • Error tracking

    • Every user-level exception

    • Every dask-level exception

  • User-level tracking

    • Which user within an account created the cluster

    • Costs (estimated when run on non-Coiled architecture)

    • Idleness

This is described in more detail at Data Privacy

User Access#

Everyone within the same account can view all analytics for this account. This is especially valuable in two situations:

  • Team leaders and managers can have a single view over all Dask work within the organization

  • Coiled support staff can be added to an account to give them greater visibility to help in resolving problems.

Accessing Data#

Data can be accessed in two locations:

  • Visually on the web at https://cloud.coiled.io/<your-account-name>/analytics

    See also the Analytics item in your sidebar

  • Programmatically with the coiled.analytics Python API (see API)

Idle Clusters#

Coiled can politely ask your Dask Scheduler to shut down after a suitable idle timeout. This can sometimes help to avoid high costs due to lingering resources.

Idle timeouts are configurable with the following configuration (off by default):

coiled:
  analytics:
    idle:
      timeout: 20 minutes

Note that when running on your own hardware (not managed by Coiled) Coiled can only make a best effort here through Dask. We can not guarantee that things will shut down cleanly (although they usually do) nor do we have any access over instances or network resources beyond the Dask processes.