Coiled Cloud Release Notes#

These release notes are related to updates for cloud.coiled.io.

October 2024#

When you’re using GPU instances, we now set the CUDA version to 12.4 instead of 12.1.

September 2024#

When the scheduler VM for a cluster is completely stuck for (by default) at least 20 minutes, we’ll now automatically shut down the VM (there will be an error message explaining that we did this). This doc has advice about troubleshooting common causes of this problem; for more general information about automatic shutdowns on Coiled, see this doc.

August 2024#

We’ve made some improvements that help get larger AWS clusters in less time. In more detail:

  • For cross-zone clusters, and for any cluster when scaling up, if we’re trying to provision more than 100 workers, we now split this into separate requests to AWS. We’ve found that makes it more likely that AWS provisions the instances sooner, and for cross-zone clusters, we’ve also found that this helps us get more instances even when AWS availability is constrained.

We’ve improved the build time for package sync environments that have many pip-installed packages (we now use uv on our build servers!). This potentially allows clusters to be up and ready sooner, rather than waiting for slow pip-heavy builds.

July 2024#

We now support Single Sign On (SSO). Contact support@coiled.io if you’re interested.

For Azure, we’ve fixed a race condition that was causing us to get fewer workers than desired for large clusters.

2024.04.22#

  • GPU metric charts are now shown (if relevant) along with the other charts of hardware metrics.

2024.04.09#

  • Fixed a bug that prevented pandas from working in Serverless Functions if numexpr was installed (the bug was caused by the way in which OMP_NUM_THREADS was being set and then unset).

2024.03.19#

  • On AWS, Coiled now configures all EC2 instances to require IMDSv2. This is a recommended practice for security, and AWS has stated they will make this requirement the default configuration in mid-2024. We don’t anticipate any impact from this change, but if you’re using EC2 Instance Profiles for providing AWS permissions and see new errors related to AWS permissions in your code, please contact support@coiled.io.

2024.02.09#

  • Fixed a bug that caused long-running Spark tasks to fail because the gRPC connection timeout was too short.

2024.02.08#

  • Enable t2a ARM instances on Google Cloud. These are the only ARM instances currently available on Google Cloud, and Google Cloud is not billing for usage under $222/month until April 1, 2024. You can read more about these instances in the Google Cloud documentation.

2024.02.06#

  • For @coiled.function serverless functions, allow multiple availability zones to be used for the same function run (intra-cluster traffic is very low, so the cost of cross-zone traffic is not a significant concern in this case).

2024.01.26#

  • For Google Cloud, disable OS Login for the VMs created by Coiled. This is required when OS Login is enabled for your GCP project because OS Login prevents SSH keys from working.

2024.01.08#

  • For Google Cloud, better handling of various GCP endpoint connection timeouts when starting and stopping instances. (We’ve seen a recent increase in connection issues with GCP.)

2023.12.15#

  • Fix so that any SSH keys Coiled sets on Google Cloud VMs are not overwritten by any existing keys set at the project level (in project metadata). We use SSH keys for coiled run and coiled cluster ssh.

2023.11.14#

  • Port 15003 is open by default on the scheduler, with encryption and authentication. This port can be used for Spark Connect if you wish to run Spark on a Coiled cluster.

2023.08.31#

  • You can now create workspaces, which are discrete accounts for creating and tracking your clusters.

2023.08.09#

  • Cloud provider setup has been revamped! You can now easily review your Coiled-managed infrastructure in one place, and it’s much easier to configure any custom network options in the UI. For AWS, Coiled can now use role assumption (more secure) instead of a service account access key.

2023.08.01#

  • Tags can be shown on the table of clusters

2023.07.17#

  • Logs can now be exported as CSV from the cluster information page

2023.07.14#

  • When filtering clusters by tag in the web dashboard, tag names and values now have autocomplete

  • Pie chart with total task prefix duration pie chart (next to the task prefix timeline on cluster information page)

2023.07.05#

  • Clusters can now be filtered by tags in the web dashboard. For example, if you set a workload tag on your clusters, type workload:prod (and hit return) to see all the clusters where you set the value of the workload tag to prod. (You can also just type the tag label and hit return to see all clusters with that tag set to any value.)

2023.06.29#

  • Usage (under billing) can be grouped by user-specified tags

2023.06.28#

  • Adds a log message histogram showing the distribution of log messages over time to the cluster logs page

  • Internal changes to how we collect analytics from Dask, this will provide more fine-grained metrics on clusters with distributed>=2023.6.1

2023.06.27#

  • The cluster log view now has clearer indication when logs are being loaded

2023.06.22#

  • Usage (under billing) can be grouped by zone, or you can see the total for your entire account

2023.06.20#

  • Usage (under billing) can be grouped by instance type (as well as by user, as before)

  • Tooltips for metric charts now group range values together

2023.06.09#

  • Metrics, computations and logs now update in real time for running clusters

2023.06.07#

  • Cluster page tabs have been redesigned and renamed to better reflect their content

  • The Cluster Lifecycle information can now be found under the Logs tab

  • The Scheduler Dashboard link has been moved to the cluster front page

2023.06.02#

  • Adds a “total” line (100% x number of cores) to the CPU Worker Total charts

  • The Analytics page has been deprecated in favor of observability information on the cluster details page and usage metrics on the Billing page

  • Fix an issue where the displayed “credits used” would be negative from payments for pay-as-you-go (PAYG) customers.

2023.05.30#

  • Your choice of which columns to show on the table of clusters is now saved (in local browser storage).

  • Clusters are now tagged with Python/dask/distributed versions. You can see cluster tags on the details page for a cluster, and click on a tag to see other clusters with matching tag.

2023.05.25#

  • Selecting a computation on the cluster page will zoom in all metrics to the duration of the computation. Zooming out will show all computations again.

2023.05.24#

  • Enable all AWS ARM instance types (except old a1), previously we had just enabled some specific types (t4g, m6g, m6gd, m7g, c7g, r6g, r6gd, r7g).

  • Allow the stopped clusters list to be sorted by used credits

2023.05.19#

  • Fixed an issue with calculating current month spend. This affected the display on the billing page and the monthly spend limit feature. Invoice amounts are not affected.

2023.05.16#

  • Clients prior to version 0.4.0 can no longer update or create conda/pip backed software environments. To create or update software environments, please upgrade your client to version 0.4.0 or later. Older client versions can continue to create clusters with existing environments.

2023.05.04#

  • There’s a conflict between matplotlib (conda-forge) and pyarrow=12 on Linux, in part because matplotlib metapackage includes pyqt. When you try to use package sync with an environment that includes matplotlib, we attempt to workaround the conflict by ignoring matplotlib and instead installing matplotlib-base (which won’t install pyqt and thus doesn’t conflict with pyarrow=12). Note that if matplotlib is installed using pip, package sync will not install matplotlib on your Coiled cluster. (We plan to address this, for now it’s best to use conda or mamba if you need matplotlib synced to cluster.)

2023.05.03#

  • Support creating software environments built for ARM instances.

2023.04.27#

  • Bug fix so that Ubuntu 22 custom images now work.

2023.04.12#

  • By default (when not explicitly specifying instance type) on AWS we’ll now prefer non-burstable instances (such as m6i) to burstable instances (such as t3). For details about why we think this is a better default choice, see our blog post about burstable instances.

2023.03.21#

  • Local AWS credentials are now forwarded to scheduler as well as workers. See Personal STS tokens for more details about how we forward AWS credentials to your cluster, or for how this can be disabled.

  • Account administrators can now limit monthly Coiled credits for an individual user or a team from https://cloud.coiled.io/team

2023.02.22#

  • The software environment page has been significantly reworked to display detailed build information, environment history, logs, and the packages actually present in a built environment.

  • Secure dashboards can be used with dask-labextension>=6.1.0 to see dashboard plots inside your Jupyter notebook (except on Safari).

2023.02.13#

Released on February 13, 2023

  • Scheduler dashboards now have HTTPS (encryption) plus authentication. This will not be enabled if you’re using a private IP address for the scheduler.

    Currently Dask JupyterLab Extension won’t work if you have auth on the scheduler dashboard, so you may wish to disable HTTPS and authentication if you’re using the JupyterLab extension to see your dashboard. You can disable HTTPS and authentication for the Scheduler dashboard using the use_dashboard_https keyword argument when creating a cluster.

  • You may now see “flags” on the web dashboard with information about your clusters’ performance (high memory usage, disk usage, etc).

2023.02.08#

Released on February 08, 2023

  • Added cpu/memory charts to the cluster overview page

2023.01.11#

Released on January 11, 2023

  • By default, you’ll just see Dask logs for your (new) cluster, not all system logs. To retrieve full logs, you can use the coiled cluster better-logs CLI.

  • You can now invite people to join your Coiled team by email address, instead of only by Coiled username. This is especially useful for inviting people who do not have Coiled accounts yet.

2022.11.28#

Released on November 28, 2022

  • Package sync build logs are now available in the software tab of the cluster details page

  • Resolved an issue with displaying team roles

  • Disabled AWS GPUs that are not compatible with the CUDA version used on clusters

  • Cluster details page now shows the zone of the cluster

2022.10.27#

Released on October 27, 2022

  • Coiled now attempts to gracefully shutdown AWS spot instances when there’s a scheduled “interruption” in the next two minutes, and by default Coiled also requests a replacement instance. If you don’t want Coiled to request a replacement, set "spot_replacement"=False in backend_options. (Note that when you’re okay paying for on-demand instances when you can’t get as many spot instances as you’ve requested, you can set "spot_on_demand_fallback"=True in backend_options.)

2022.10.03#

Released on October 3, 2022

  • Added an overview tab to the cluster details page with cluster status and a consolidated list of potential warnings or errors.

Cluster alerts page in web application

2022.09.21#

Released on September 21, 2022

  • Show upcoming bill amounts on the billing page

  • PAYG customers can set monthly spend limits

2022.09.14#

Released on September 14, 2022

Enhancements#

  • Added tabs to the cluster details page and the cluster analytics page to make it easier for you to navigate between one page or the other.

  • Tweaks to improve cluster start time.

  • Experimental support for ARM (Graviton) instances on AWS. This isn’t production ready yet but let us know if you’re interesting in giving it a try and we’d be happy to chat!

2022.06.29#

Released on June 29, 2022

Enhancements#

  • The default persistent disk size for most instances types is now smaller, which will reduce cost and help avoid running into cloud provider quotas. Previously we attached a 100GB disk to every instance, now the default size will be between 30GB and 100GB and depends on how much memory (RAM) the instance has. If you know that your workload requires larger disks, you can either specify a larger disk with the worker_disk_size keyword argument when creating a cluster, or on AWS you can use an instance type such as the i3.large with a local NVMe (Coiled will configure the NVMe to be used by Dask for temporary storage).

2022.06.01#

Released on June 1, 2022

Enhancements#

  • We’ve fixed an issue with running xgboost training on v2 clusters.

Deprecated#

  • For new customers, Coiled-hosted is no longer offered; you’ll be able to sign up for Coiled and use your own AWS or GCP account for your clusters.

2022.05.26#

Released on May 26, 2022

Enhancements#

  • Coiled v2 supports spot instances on GCP (AWS spot instances were already supported).

2022.05.20#

Released on May 20, 2022

Enhancements#

  • Coiled v2 now supports GPU instances on both AWS and GCP. Only a single GPU per instance is currently utilized. For AWS, simple use an instance type with attached GPU; for GCP, you’ll need to use n1 family instance and attach guess accelerator using worker_gpu_type keyword when creating a cluster. See GPUs for more information.

2022.04.28#

Released on April 28, 2022

Enhancements#

  • Coiled v2 Beta clusters now accept the environ and worker_class keyword argument.

  • Fixed a bug in v2 clusters affecting instance type selection while creating clusters in an account that’s different from your user’s default account.

  • You can now specify an extra service account when configuring your GCP cloud backend. For Coiled v2, this service account will be attached to the instances that Coiled creates so it can access resources in your GCP account.

2022.03.29#

Released on March 29, 2022

Enhancements#

  • Signing up for a pro account now requires account verification

2022.03.22#

Released on March 3, 2022

Deprecated#

  • Creating cluster configurations from the UI are longer available in preparation for deprecation

Documentation#

  • Documentation related to creating cluster configuration has been removed in preparation for the deprecation of custom cluster configurations

2022.03.17#

Released on March 17, 2022

Enhancements#

  • Some internal changes to improve stability and reliability.

2022.03.09#

Released on March 9, 2022

Enhancements#

  • Updated style and wording in the activation banner that new accounts see when they login to Coiled and their account isn’t activated yet.

  • You can now request your account activation directly from the activation banner, by clicking the Activate Coiled Now! button.

2022.02.23#

Released on February 23, 2022

Enhancements#

  • Improved error message when asking for a Cluster that’s over your account node limit. This error message will now contain the number of nodes requested, the account limit, and the cores limit for that account.

Fixes#

  • Fixed issue where accounts created using social login could get an invalid slug. Accounts created using social login will now always get a valid slug.

  • Fixed issue where the core count in the usage tab of the clusters dashboard wouldn’t update once the cluster scales up/down.

2022.02.09#

Released on February 9, 2022

Fixes#

  • Fixed issue where the core count wasn’t being appropriately counted if users specified instance types.

Enhancements#

  • Core count will now get the number of cores from the instance vCPU and update the count as workers start connecting to the scheduler.

Documentation#

  • Added section for the new keyword argument wait_for_workers keyword of the coiled.Cluster() constructor is using. This argument is used to make sure that the Cluster is ready to start a computation and return more information back to the user when the Cluster can’t get workers.

  • Added a section on Docker to be used with Coiled when creating software environments.

2022.01.26#

Released on January 26, 2022

Fixes#

  • Fixed an issue that was causing the reset password page to reload continuously, preventing users from choosing a new password.

  • Fixed issue that was causing clusters not to stop when requested by the user, if the cluster was created in a different availability zone than the default one.

Enhancements#

  • You are now able to specify any instance type available from your cloud provider of choice. You might wish to run the command coiled.get_notifications(level="ERROR") if you have issues creating clusters with the specified instance types.

Documentation#

2022.01.12#

Released on January 12, 2022

Fixes#

  • Fixed issue where setting nthreads when launching a cluster wasn’t respected. You can override worker worker_options={"nthreads": <number of threads>} passed to the coiled.Cluster constructor.

  • Removed references to Azure from Coiled Cloud

Enhancements#

  • For AWS, VPC creation that runs when you set your backend options to run Coiled on your cloud provider of choice will now create one subnet for each Availability Zone in the region you chose to run Coiled.

  • You can now specify an Availability Zone when creating a cluster (you might need to rerun the VPC creation process).

  • Periodic cleanup will now cleanup resources in different Availability Zones.

Documentation#

  • Added warning in the Firewall and Networking section of the cloud providers documentation that this feature is under active development and is in an experimental phase.

2021.12.15#

Released on December 15, 2021

Fixes#

  • Fixed a frontend issue where a customer’s payment info was not showing up even though it had been entered.

  • Fixed an intermittent issue where users for some credit cards were unable to enter their security code. This has been fixed and all credit cards should work consistently.

Enhancements#

  • Dask workers now use public IPs so that NAT Gateway is no longer needed; ingress to workers is still blocked. bring your own can disable public IPs for workers by setting the the give_workers_public_ip option.

  • Added a UI for BYO network so network options can also be configured through the UI when selecting your backend.

  • Free tier account usage is still on an opt-in model. If you are a new user please contact support@coiled.io to enable software environments and cluster creations.

  • Azure functionality has been removed and disabled for users. Users previously hosted on Coiled-hosted Azure have been migrated to the AWS backend.

Documentation#

  • Fixed a couple of broken links in the documentation on teams Manage Users.

  • Added more examples to the BYO network documentation.

2021.12.01#

Released on December 1, 2021

Enhancements#

  • Added ability to manage API access tokens using (optional) expiration dates or manual revocation. Added support for managing API tokens via the Coiled Python client.

  • Added account limit alert when 99% of the quota is used and when your account has reached its quota limit.

  • Changed the default to use on-demand VMs for Dask workers as opposed to spot or preemptible instances. Backend options can still be set to use spot or preemptible instances, see AWS backend options or GCP backend options.

  • Added ability to use pre-existing cloud resources (e.g., VPC, subnets, security groups) when running Coiled in your own cloud provider account.

Deprecated#

  • Coiled Notebooks and Coiled Jobs have been deprecated.

Documentation#

  • As part of upcoming deprecation of the Azure cloud provider backend, the documentation related to Azure has been removed.

  • Coiled client version of 0.0.55 or higher is required - please update your client if needed.

2021.11.10#

Released on November 10, 2021

Fixes#

  • Dask workers will now use all CPU/Memory available for the instance type in which they have been created. In the past, workers would be limited by your CPU/Memory specification.

Enhancements#

  • Moved the Coiled Subscription tab up on the account settings page to make it easier for you to see how many credits you have used so far.

  • If you are using Coiled on your cloud provider, you can now customize ingress rules for the firewall/security group created by Coiled by specifying ingress ports and a CIDR block.

Deprecated#

  • Coiled Notebooks and Coiled Jobs were an experimental feature which is being deprecated. After December 1, 2021, these will no longer be available.

Documentation#

  • Updated the list of dependencies in the software environments documentation page to include dask[complete] while creating a software environment with pip.

  • Added troubleshooting article for repeated cluster timeout.

  • Embedded tutorial videos for cluster configuration and software environments documentation.

2021.10.27#

Released on October 27, 2021

Fixes#

  • The route table for the private subnet that is created when Coiled creates a VPC in your AWS account, is now called coiled-vm-private-router instead of coiled-vm-public-router.

  • Mitigate Rate Limit exceptions when performing some actions like scaling clusters, which should improve cluster reliability.

  • Software environment names must now be lowercase only.

Enhancements#

  • Removed experimental warnings for GCP and Azure in the UI when choosing a backend option for an account.

  • Removed fallback option to fetch logs from instances via SSH.

Documentation#

  • Removed experimental notes for GCP and Azure in the respective section of the documentation for these backends.

  • Updated default worker_memory to 8GiB in a few pages where it was saying that the default was 16GiB.

  • Added a section about network architecture to the security page.

  • Added a tutorial on Allowable Instance Types.

  • Added a tutorial on GPUs.

  • Added section on selecting instance types

  • Added a Networking section on the documentation page that explains how you can specify your AWS security groups using the new arguments enable_public_http, enable_public_ssh and disable_public_ingress.

2021.10.13#

Released on October 13, 2021

Fixes#

  • Environment variables sent to the Cluster with the keyword argument environ= are now being converted to strings, which fixes occasional failures when sending non-string values to the Cluster.

Enhancements#

  • You can now use Coiled in your own GCP account.

  • You can now use Coiled in your own Azure account.

  • You can now select a region or zone when launching clusters in GCP.

  • You can now create software environments using Docker images stored in your private ECR (AWS), ACR (Azure) or GAR (GCP) container registries, in addition to Docker Hub and other registries, by calling coiled.create_software_environment(container="<URI>").

  • Coiled now collects statistical profiling data from your Dask clusters. This data is visualized as a flame graph on the Analytics page for individual clusters.

  • You can now hide/show columns in the Clusters Dashboard. The options are: Id, Cluster Name, Created By, Status, Num Workers, Software Environment, Cost (current), Cost(total), Last Seen, Backend, Runtime, Spot/Preemptible.

  • Improve log filtering for AWS when viewing logs in the Coiled UI.

Documentation#

  • Added a new example on using the Dask Snowflake connector.

  • Fix link to Coiled’s privacy policy in the security page.

  • Added new section in the GPUs documentation to demonstrate the use how of GPUs with the Afar library to run remote commands.

2021.09.28#

Released on September 28, 2021

Fixes#

  • Resolve error that was throwing an “Unable to stop cluster” error message in the Clusters Dashboard for users using the Azure backend.

  • Fix issue with workers not being created when users create a new Cluster using the AWS backend.

  • Resolve error that was causing Clusters to shut down immediately upon creation for users using the AWS backend.

  • Fix issue that was causing the Cluster Dashboard table to show zero workers count even though the workers were created and connected to the scheduler.

Enhancements#

  • Add label containing the instance name to notification when running coiled.get_notifications().

Documentation#

  • Fix typo in CLI command, documentation mentioned coiled inspect but the right command is coiled env inspect.

  • Update Manage Users page to better explain the distinction between Accounts and Teams.