Automatic Shutdown#
Coiled will automatically shut down your cluster in various conditions to help you save money when we determine that your clusters is idle or unable to do work.
Idle Shutdown#
Idle shutdown is triggered by the Dask scheduler when there’s no work being done on the cluster for some amount of time.
By default, this timeout is 20 minutes for Dask clusters. You may want to increase this timeout if
you want to keep a running cluster on “standby” to quickly accept and process tasks (e.g.) throughout the day, or
you have a very large task graph which takes a very long time to generate and submit to the scheduler.
This timeout is configurable using the idle_timeout
keyword.
For example, here we start a cluster with the idle timeout set to 2 hours:
import coiled
cluster = coiled.Cluster(idle_timeout="2 hours")
No Client Shutdown#
No client shutdown is triggered by the Dask scheduler when all Dask clients have disconnected and no client has reconnected after some amount of time.
By default, this timeout is 2 minutes for Dask clusters. If you specify a non-default idle timeout, will also be used for the no client shutdown timeout.
You may want to increase this timeout or disable this timeout if
you want to keep a running cluster on “standby” to quickly accept and process tasks (e.g.) throughout the day, or
you’re on an especially unreliable internet connection between the client and the cluster.
This timeout is configurable using the no_client_timeout
keyword.
For example, here we start a cluster with the no client timeout disabled:
import coiled
cluster = coiled.Cluster(no_client_timeout=None)
This timeout is automatically disabled when shutdown_on_close
keyword argument is False
.
Unresponsive Scheduler Shutdown#
Unresponsive scheduler shutdown is triggered by the Coiled control plane when the scheduler VM has failed to send heartbeats for 20 minutes.
The timeout duration is not configurable.
The most common cause of this timeout is when the scheduler runs out of memory. See this doc for advice about how to further troubleshoot this.