Hyperparamater Optimization at Scale with XGBoost, Optuna, and Dask

In the example on Hyperparameter Optimization with XGBoost, we saw how to perform hyperparameter optimization with Dask + Optuna + XGBoost for a case where the data fits comfortably in memory. But when the data doesn’t fit comfortably in memory, we would like to parallelize each XGBoost call with Dask. However, the current xgboost.dask implementation takes over the entire Dask cluster, so running many of these at once is problematic.

In this guide, you’ll learn how to perform hyperparameter optimization using Optuna with multiple Dask clusters that train several models using xgboost.dask. This work extends the example shown in Hyperparameter Optimization with XGBoost, but now we’ll run many model trainings in parallel, each model running in a separate Dask cluster. Download this jupyter notebook or check out https://github.com/coiled/dask-xgboost-nyctaxi to run this example yourself.

In this example we:

  • Load a single, consistent dataset from S3.

  • Create multiple Dask clusters using Coiled, one per each thread in a local thread pool.

  • Run Optuna locally on that thread pool.

multi cluster diagram

Conceptual diagram of using multiple clusters (click to enlarge)

Before you start

You’ll first need install the necessary packages. For the purposes of this example, we’ll do this in a new virtual environment, but you could also install them in whatever environment you’re already using for your project.

$ conda create -n parallel-hpo-mc-example -c conda-forge python=3.10 dask coiled s3fs pyarrow dask-ml optuna matplotlib
$ conda activate parallel-hpo-mc-example
$ pip install xgboost

You also could use pip for everything, or any other package manager you prefer; conda isn’t required.

When you create a cluster, Coiled will automatically replicate your local parallel-hpo-mc-example environment to your cluster.

About the Data

In this example we will use a dataset that the Coiled team created by pre-processing the Uber/Lyft dataset from the High-Volume For-Hire Services, joined it with the NYC Taxi Zone Lookup Table. This results in a dataset with ~1.4 billion rows.

About the Model

Each Dask Cluster will be be training a single XGBoost model with Dask using the xgboost.dask module built into XGBoost.

  • Load the data

  • Perform basic feature engineering (date type optimization, categorization)

  • Train a single model with XGBoost, using custom cross-validation

Get Clusters and Data

import threading
from typing import Dict

import coiled
import dask
import dask.dataframe as dd
import distributed
import numpy as np

# Dask workers per cluster
N_WORKERS = 50

# Total number of EC2 instances spun up = N_JOBS * N_WORKERS where NJOBS is
# the number of parallel optuna jobs to run


# We cache the results in a clusters dictionary
clusters: Dict[int, tuple[distributed.Client, dd.DataFrame]] = {}


def get_ddf() -> tuple[distributed.Client, dd.DataFrame]:
    thread_id = threading.get_ident()
    # checking if there are already results cached for a given thread
    # if so, return them. If not, continue with the rest of the function
    try:
        return clusters[thread_id]
    except KeyError:
        pass

    cluster = coiled.Cluster(
        name=f"xgb-nyc-taxi-gbh-{thread_id}",
        worker_vm_types=["r6i.large"],
        scheduler_vm_types=["m6i.large"],
        n_workers=N_WORKERS,
        show_widget=False,
        backend_options={
            "region": "us-east-2",
            "multizone": True,
        },
        scheduler_options={"idle_timeout": "15 minutes"},
    )

    client = distributed.Client(cluster, set_as_default=False)
    print("Started cluster at", client.dashboard_link)

    with client.as_current():
        # load data
        ddf = dd.read_parquet(
            "s3://coiled-datasets/dask-xgboost-example/feature_table.parquet/"
        )

        # Reduce dataset size. Uncomment to speed up the exercise.
        # ddf = ddf.partitions[:20]

        # Basic feature Engineering
        # Under the hood, XGBoost converts floats to `float32`.
        # Let's do it only once here.
        float_cols = ddf.select_dtypes(include="float").columns.tolist()
        ddf = ddf.astype({c: np.float32 for c in float_cols})

        # We need the categories to be known
        categorical_vars = ddf.select_dtypes(include="category").columns.tolist()

        # categorize() reads the whole input and then discards it.
        # Let's read from disk only once.
        ddf = ddf.persist()
        ddf = ddf.categorize(columns=categorical_vars, scheduler=client)

        # We will need to access this multiple times. Let's persist it.
        ddf = ddf.persist()

        clusters[thread_id] = client, ddf
        return client, ddf

Custom cross-validation

In this example we show you how you can use a custom cross-validation function such as:

from collections.abc import Iterator

# Number of folds determines the train/test split
# (e.g. N_FOLDS=5 -> train=4/5 of the total data, test=1/5)
N_FOLDS = 5


def make_cv_splits(
    ddf: dd.DataFrame, n_folds: int = N_FOLDS
) -> Iterator[tuple[dd.DataFrame, dd.DataFrame]]:
    frac = [1 / n_folds] * n_folds
    splits = ddf.random_split(frac, shuffle=True)
    for i in range(n_folds):
        train = [splits[j] for j in range(n_folds) if j != i]
        test = splits[i]
        yield dd.concat(train), test

Train Model

Notice that we are training one model per cluster.

When using XGBoost with Dask, we need to call the XGBoost Dask interface from the client side. The main difference with XGBoost’s Dask interface is that we pass our Dask client as an additional argument for carrying out the computation. Note that if the client parameter below is set to None, XGBoost will use the default client returned by Dask.

from typing import Dict

import dask.array as da
import xgboost
from dask_ml.metrics import mean_squared_error

def train_model(study_params: Dict[str, float]) -> float:
    scores = []
    client, ddf = get_ddf()

    with client.as_current():
        for train, test in make_cv_splits(ddf):
            y_train = train["trip_time"]
            X_train = train.drop(columns=["trip_time"])
            y_test = test["trip_time"]
            X_test = test.drop(columns=["trip_time"])

            d_train = xgboost.dask.DaskDMatrix(
                client, X_train, y_train, enable_categorical=True
            )
            model = xgboost.dask.train(
                client,
                {"tree_method": "hist", **study_params},
                d_train,
                num_boost_round=4,
                evals=[(d_train, "train")],
            )
            predictions = xgboost.dask.predict(None, model, X_test)
            score = mean_squared_error(
                y_test.to_dask_array(),
                predictions.to_dask_array(),
                squared=False,
                compute=False,
            )
            # Compute predictions and mean squared error for this iteration
            # while we start the next one
            scores.append(score.reshape(1).persist())
            del d_train, model, predictions, score

        scores = da.concatenate(scores).compute()
        return scores.mean()

Create objective function

In this example we use Optuna to optimize several hyperparametersto train an XGBoost model. There is no Dask-specific code here. This is exactly the same code you would write if you were to run Optuna on your local machine.

def objective(trial):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 75, 125),
        "learning_rate": trial.suggest_float("learning_rate", 0.5, 0.7),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1),
        "colsample_bynode": trial.suggest_float("colsample_bynode", 0.5, 1),
        "colsample_bylevel": trial.suggest_float("colsample_bylevel", 0.5, 1),
        "reg_lambda": trial.suggest_float("reg_lambda", 0, 1),
        "max_depth": trial.suggest_int("max_depth", 1, 6),
        "max_leaves": trial.suggest_int("max_leaves", 0, 2),
        "max_cat_to_onehot": trial.suggest_int("max_cat_to_onehot", 1, 10),
    }
    return train_model(params)

Create a single Optuna study

from datetime import datetime
import optuna

Note

The following cell will start 10 clusters, and perform 50 trials accross those clusters where each trial will train an xgboost model with 5 folds cross validation. These take approximately ~1.5hours. If you want to experiment with this example we recommend using a small set of the data (see instructions in get_ddf function above), or reducing the amount of trials (see final plot for reference on convergence)

%%time
# Number of parallel optuna jobs to run
N_JOBS = 10
# Total number of converging trials to run across the various jobs
N_TRIALS = 50

start = datetime.now()
study = optuna.create_study(study_name="parallel-nyc-travel-time-model")
study.optimize(objective, n_trials=N_TRIALS, n_jobs=N_JOBS)
print(f"Total time:  {datetime.now() - start}")
[I 2023-03-07 16:33:41,139] A new study created in memory with name: parallel-nyc-travel-time-model
Started cluster at https://cluster-uijcb.dask.host?token=XOy80Ze_VMszSqcJ
Started cluster at https://cluster-rupur.dask.host?token=K8_BfzuFN5qZE9nE
Started cluster at https://cluster-lantd.dask.host?token=2Z_vUwWul-bMeKWm
Started cluster at https://cluster-kczql.dask.host?token=mzcZuUPJTEUhyN7W
Started cluster at https://cluster-oeayy.dask.host?token=izY0aWOZ_awSqiPJ
Started cluster at https://cluster-akkqn.dask.host?token=Tc4is6XJX9oB0INe
Started cluster at https://cluster-sjeeg.dask.host?token=zIODfWyX79gklPe1
Started cluster at https://cluster-jjmgz.dask.host?token=mSVqQPg_Ntrtit9U
Started cluster at https://cluster-oafke.dask.host?token=xLV6U2ZWyqjDvtm1
Started cluster at https://cluster-nclvd.dask.host?token=lu0wzKCJuISQ37rU
[I 2023-03-07 16:50:57,736] Trial 9 finished with value: 459.91804561799154 and parameters: {'n_estimators': 108, 'learning_rate': 0.6399848233366736, 'colsample_bytree': 0.5535380852398557, 'colsample_bynode': 0.6709304141713173, 'colsample_bylevel': 0.9467542799086617, 'reg_lambda': 0.5171355442461827, 'max_depth': 2, 'max_leaves': 0, 'max_cat_to_onehot': 6}. Best is trial 9 with value: 459.91804561799154.
[I 2023-03-07 16:51:01,738] Trial 0 finished with value: 706.5706220081988 and parameters: {'n_estimators': 110, 'learning_rate': 0.6698534462711541, 'colsample_bytree': 0.6797323372337529, 'colsample_bynode': 0.7801900079820215, 'colsample_bylevel': 0.8946977001547145, 'reg_lambda': 0.19664684910323915, 'max_depth': 4, 'max_leaves': 1, 'max_cat_to_onehot': 5}. Best is trial 9 with value: 459.91804561799154.
[I 2023-03-07 16:51:57,419] Trial 5 finished with value: 626.7690927469862 and parameters: {'n_estimators': 97, 'learning_rate': 0.61273599393898, 'colsample_bytree': 0.5939265634726271, 'colsample_bynode': 0.5090902162095029, 'colsample_bylevel': 0.8543647526671865, 'reg_lambda': 0.5739675974626878, 'max_depth': 3, 'max_leaves': 2, 'max_cat_to_onehot': 9}. Best is trial 9 with value: 459.91804561799154.
[I 2023-03-07 16:52:18,477] Trial 2 finished with value: 450.04005372042775 and parameters: {'n_estimators': 106, 'learning_rate': 0.5497999643418078, 'colsample_bytree': 0.8297876657256671, 'colsample_bynode': 0.9349242189370874, 'colsample_bylevel': 0.872796401227229, 'reg_lambda': 0.010059589440354233, 'max_depth': 1, 'max_leaves': 0, 'max_cat_to_onehot': 5}. Best is trial 2 with value: 450.04005372042775.
[I 2023-03-07 16:52:18,698] Trial 7 finished with value: 706.5247900842206 and parameters: {'n_estimators': 103, 'learning_rate': 0.6895092824378255, 'colsample_bytree': 0.8653853328322703, 'colsample_bynode': 0.8732351985342599, 'colsample_bylevel': 0.8001110690639806, 'reg_lambda': 0.3991736147254483, 'max_depth': 4, 'max_leaves': 1, 'max_cat_to_onehot': 1}. Best is trial 2 with value: 450.04005372042775.
[I 2023-03-07 16:52:53,727] Trial 4 finished with value: 707.0722534262718 and parameters: {'n_estimators': 78, 'learning_rate': 0.5938273029945034, 'colsample_bytree': 0.8231356220954428, 'colsample_bynode': 0.9035952358667221, 'colsample_bylevel': 0.5826773119883825, 'reg_lambda': 0.11359868061517109, 'max_depth': 3, 'max_leaves': 1, 'max_cat_to_onehot': 3}. Best is trial 2 with value: 450.04005372042775.
[I 2023-03-07 16:53:20,336] Trial 1 finished with value: 516.5998683845554 and parameters: {'n_estimators': 96, 'learning_rate': 0.6962078235177506, 'colsample_bytree': 0.6290964678545394, 'colsample_bynode': 0.8649705309132565, 'colsample_bylevel': 0.5934239573531139, 'reg_lambda': 0.3799276008709741, 'max_depth': 5, 'max_leaves': 2, 'max_cat_to_onehot': 7}. Best is trial 2 with value: 450.04005372042775.
[I 2023-03-07 16:53:23,962] Trial 8 finished with value: 438.5476449497015 and parameters: {'n_estimators': 107, 'learning_rate': 0.6475635811563369, 'colsample_bytree': 0.9360100314496721, 'colsample_bynode': 0.6936171669969478, 'colsample_bylevel': 0.7293969787062828, 'reg_lambda': 0.6276093737313736, 'max_depth': 3, 'max_leaves': 0, 'max_cat_to_onehot': 10}. Best is trial 8 with value: 438.5476449497015.
[I 2023-03-07 16:55:10,185] Trial 3 finished with value: 347.17865923346363 and parameters: {'n_estimators': 82, 'learning_rate': 0.6885561801784386, 'colsample_bytree': 0.9321950766917282, 'colsample_bynode': 0.8536491941088509, 'colsample_bylevel': 0.9398038249417635, 'reg_lambda': 0.29542382198539385, 'max_depth': 4, 'max_leaves': 0, 'max_cat_to_onehot': 4}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 16:55:24,913] Trial 6 finished with value: 450.8228409416054 and parameters: {'n_estimators': 95, 'learning_rate': 0.5810255020460922, 'colsample_bytree': 0.6243932953989939, 'colsample_bynode': 0.6371895268042804, 'colsample_bylevel': 0.9967395138722356, 'reg_lambda': 0.6838365057984627, 'max_depth': 3, 'max_leaves': 0, 'max_cat_to_onehot': 3}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:03:58,091] Trial 11 finished with value: 502.2055119602801 and parameters: {'n_estimators': 104, 'learning_rate': 0.5775782551942003, 'colsample_bytree': 0.7823077065309351, 'colsample_bynode': 0.9433870837819863, 'colsample_bylevel': 0.6448375848139956, 'reg_lambda': 0.5235896369341139, 'max_depth': 1, 'max_leaves': 2, 'max_cat_to_onehot': 7}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:05:39,246] Trial 13 finished with value: 451.9085990155357 and parameters: {'n_estimators': 112, 'learning_rate': 0.5696381676907815, 'colsample_bytree': 0.8832417782032596, 'colsample_bynode': 0.5389457552721898, 'colsample_bylevel': 0.9752355620655051, 'reg_lambda': 0.33593323922732166, 'max_depth': 6, 'max_leaves': 2, 'max_cat_to_onehot': 9}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:06:20,708] Trial 16 finished with value: 707.7125849514883 and parameters: {'n_estimators': 95, 'learning_rate': 0.5561283797322206, 'colsample_bytree': 0.7074835580916361, 'colsample_bynode': 0.5038400571799664, 'colsample_bylevel': 0.8999590737987622, 'reg_lambda': 0.27147822743976624, 'max_depth': 1, 'max_leaves': 1, 'max_cat_to_onehot': 6}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:06:28,554] Trial 12 finished with value: 706.5676108583787 and parameters: {'n_estimators': 120, 'learning_rate': 0.670916503750123, 'colsample_bytree': 0.6630550917874198, 'colsample_bynode': 0.825776049572102, 'colsample_bylevel': 0.5717548581782297, 'reg_lambda': 0.0994383155335109, 'max_depth': 6, 'max_leaves': 1, 'max_cat_to_onehot': 2}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:06:46,323] Trial 10 finished with value: 706.5319100137592 and parameters: {'n_estimators': 122, 'learning_rate': 0.6858420128883721, 'colsample_bytree': 0.8652385194212884, 'colsample_bynode': 0.9290849457172798, 'colsample_bylevel': 0.5995242880823524, 'reg_lambda': 0.680463502797724, 'max_depth': 3, 'max_leaves': 1, 'max_cat_to_onehot': 2}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:06:50,553] Trial 14 finished with value: 404.27918035137174 and parameters: {'n_estimators': 107, 'learning_rate': 0.6786038285626781, 'colsample_bytree': 0.8985738745433061, 'colsample_bynode': 0.8788636064079021, 'colsample_bylevel': 0.6512917871262596, 'reg_lambda': 0.4070172861475452, 'max_depth': 2, 'max_leaves': 0, 'max_cat_to_onehot': 4}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:08:51,473] Trial 17 finished with value: 706.6318215989428 and parameters: {'n_estimators': 124, 'learning_rate': 0.6521667386987423, 'colsample_bytree': 0.9569516961979767, 'colsample_bynode': 0.6840764626052795, 'colsample_bylevel': 0.996675355923968, 'reg_lambda': 0.7417906836550517, 'max_depth': 3, 'max_leaves': 1, 'max_cat_to_onehot': 2}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:10:48,426] Trial 15 finished with value: 396.7710119396819 and parameters: {'n_estimators': 116, 'learning_rate': 0.6568316445477205, 'colsample_bytree': 0.7165342538651144, 'colsample_bynode': 0.9883170825676679, 'colsample_bylevel': 0.9140068314391551, 'reg_lambda': 0.7193193939364403, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 5}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:15:39,851] Trial 18 finished with value: 364.9562497644124 and parameters: {'n_estimators': 107, 'learning_rate': 0.5304671029161999, 'colsample_bytree': 0.8571946091246279, 'colsample_bynode': 0.6773042827636755, 'colsample_bylevel': 0.8849542518665072, 'reg_lambda': 0.007493028502117416, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 7}. Best is trial 3 with value: 347.17865923346363.
[I 2023-03-07 17:15:55,535] Trial 19 finished with value: 332.6632716628251 and parameters: {'n_estimators': 125, 'learning_rate': 0.5179308871980247, 'colsample_bytree': 0.9333468818076025, 'colsample_bynode': 0.9820769284295785, 'colsample_bylevel': 0.9936380351196751, 'reg_lambda': 0.9396137502081786, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 1}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:18:10,216] Trial 21 finished with value: 361.50503421399253 and parameters: {'n_estimators': 122, 'learning_rate': 0.6439622056734687, 'colsample_bytree': 0.9626561232968579, 'colsample_bynode': 0.7840681545368369, 'colsample_bylevel': 0.7308561827784634, 'reg_lambda': 0.928143265667958, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 10}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:18:42,206] Trial 20 finished with value: 342.1026645473983 and parameters: {'n_estimators': 121, 'learning_rate': 0.6529159475197441, 'colsample_bytree': 0.9834214974086493, 'colsample_bynode': 0.7949495835945106, 'colsample_bylevel': 0.7226170146077365, 'reg_lambda': 0.911291579248073, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 10}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:19:31,722] Trial 23 finished with value: 346.9275870448841 and parameters: {'n_estimators': 85, 'learning_rate': 0.6358893299463163, 'colsample_bytree': 0.9869880543633085, 'colsample_bynode': 0.9929693455450609, 'colsample_bylevel': 0.727254158918082, 'reg_lambda': 0.9222894505811144, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 10}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:19:54,950] Trial 22 finished with value: 361.47155123065517 and parameters: {'n_estimators': 122, 'learning_rate': 0.6508063712152334, 'colsample_bytree': 0.9938637069385522, 'colsample_bynode': 0.7757958736891309, 'colsample_bylevel': 0.7468280750660689, 'reg_lambda': 0.9097148081121074, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 10}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:20:58,361] Trial 25 finished with value: 408.6774513132928 and parameters: {'n_estimators': 86, 'learning_rate': 0.515541517267973, 'colsample_bytree': 0.9907798292655484, 'colsample_bynode': 0.9988128879902017, 'colsample_bylevel': 0.7108614960548679, 'reg_lambda': 0.8409020281219484, 'max_depth': 2, 'max_leaves': 0, 'max_cat_to_onehot': 4}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:23:54,731] Trial 26 finished with value: 424.04393242272 and parameters: {'n_estimators': 84, 'learning_rate': 0.530392254355344, 'colsample_bytree': 0.9643413230352433, 'colsample_bynode': 0.9889126589221562, 'colsample_bylevel': 0.5064955857542202, 'reg_lambda': 0.9070770234921224, 'max_depth': 2, 'max_leaves': 0, 'max_cat_to_onehot': 4}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:26:24,982] Trial 27 finished with value: 394.6020176730988 and parameters: {'n_estimators': 84, 'learning_rate': 0.5084881261772749, 'colsample_bytree': 0.7397385662629471, 'colsample_bynode': 0.9781193358748361, 'colsample_bylevel': 0.9399009000977889, 'reg_lambda': 0.8460962712296195, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 4}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:28:39,758] Trial 24 finished with value: 350.16844266361414 and parameters: {'n_estimators': 85, 'learning_rate': 0.641263902427605, 'colsample_bytree': 0.9906404760088948, 'colsample_bynode': 0.7351328035148854, 'colsample_bylevel': 0.7080260008072945, 'reg_lambda': 0.8826574033713829, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 10}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:32:51,541] Trial 32 finished with value: 334.7178146545731 and parameters: {'n_estimators': 87, 'learning_rate': 0.6177710394096174, 'colsample_bytree': 0.9971142356103272, 'colsample_bynode': 0.999337125705259, 'colsample_bylevel': 0.7844983384396753, 'reg_lambda': 0.9750617490250187, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 9}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:32:54,248] Trial 29 finished with value: 344.98104370482696 and parameters: {'n_estimators': 85, 'learning_rate': 0.5145363730262912, 'colsample_bytree': 0.936059701351899, 'colsample_bynode': 0.9767686493778588, 'colsample_bylevel': 0.9428828818692536, 'reg_lambda': 0.9881518636158998, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 1}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:33:18,633] Trial 31 finished with value: 347.2728598179241 and parameters: {'n_estimators': 84, 'learning_rate': 0.5016196931099021, 'colsample_bytree': 0.992901031062656, 'colsample_bynode': 0.9977947724130465, 'colsample_bylevel': 0.9424809095329362, 'reg_lambda': 0.9849898971806041, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 1}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:34:16,493] Trial 30 finished with value: 347.09421735966504 and parameters: {'n_estimators': 85, 'learning_rate': 0.5066468441376468, 'colsample_bytree': 0.9979021903407257, 'colsample_bynode': 0.999457060819317, 'colsample_bylevel': 0.8105217181819686, 'reg_lambda': 0.9680967506954332, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 1}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:36:05,968] Trial 28 finished with value: 349.2355101307413 and parameters: {'n_estimators': 86, 'learning_rate': 0.5119745649007706, 'colsample_bytree': 0.9920526855316918, 'colsample_bynode': 0.7656561436357512, 'colsample_bylevel': 0.8257647574682665, 'reg_lambda': 0.9658520490764436, 'max_depth': 5, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 19 with value: 332.6632716628251.
[I 2023-03-07 17:40:08,442] Trial 33 finished with value: 327.95215804440556 and parameters: {'n_estimators': 88, 'learning_rate': 0.620483334826477, 'colsample_bytree': 0.9924092433149426, 'colsample_bynode': 0.9647169421797864, 'colsample_bylevel': 0.8065679903678107, 'reg_lambda': 0.9884315287794484, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:40:27,856] Trial 34 finished with value: 329.14326906256315 and parameters: {'n_estimators': 90, 'learning_rate': 0.6228651214507024, 'colsample_bytree': 0.9158786349278677, 'colsample_bynode': 0.9645944709323654, 'colsample_bylevel': 0.7978162832106585, 'reg_lambda': 0.9940823913367136, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:42:16,555] Trial 36 finished with value: 329.3534904185832 and parameters: {'n_estimators': 90, 'learning_rate': 0.6188281290237252, 'colsample_bytree': 0.9079641709344893, 'colsample_bynode': 0.9574228737027115, 'colsample_bylevel': 0.7939521215359945, 'reg_lambda': 0.9903287229522949, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:44:23,453] Trial 35 finished with value: 329.6542225715352 and parameters: {'n_estimators': 89, 'learning_rate': 0.6211958234747134, 'colsample_bytree': 0.9097715036380705, 'colsample_bynode': 0.9616433719868638, 'colsample_bylevel': 0.7990265911099529, 'reg_lambda': 0.9634536456684167, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:44:26,524] Trial 38 finished with value: 706.8160057509638 and parameters: {'n_estimators': 116, 'learning_rate': 0.6200400720122841, 'colsample_bytree': 0.920004749191604, 'colsample_bynode': 0.9073805772691944, 'colsample_bylevel': 0.8209102585028519, 'reg_lambda': 0.9946555577146179, 'max_depth': 6, 'max_leaves': 1, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:48:09,521] Trial 37 finished with value: 330.05628698024583 and parameters: {'n_estimators': 116, 'learning_rate': 0.617506489985915, 'colsample_bytree': 0.9093786502146559, 'colsample_bynode': 0.9538451804712185, 'colsample_bylevel': 0.7862191448972011, 'reg_lambda': 0.9947479032003648, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:48:32,873] Trial 39 finished with value: 706.8102226257521 and parameters: {'n_estimators': 113, 'learning_rate': 0.6208013673568162, 'colsample_bytree': 0.9119931632316574, 'colsample_bynode': 0.9437625606948115, 'colsample_bylevel': 0.7988206330609289, 'reg_lambda': 0.9982392253017746, 'max_depth': 6, 'max_leaves': 1, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:51:29,697] Trial 41 finished with value: 329.44935533642104 and parameters: {'n_estimators': 90, 'learning_rate': 0.6243719855666229, 'colsample_bytree': 0.9175043438898698, 'colsample_bynode': 0.9443303454030586, 'colsample_bylevel': 0.8351600038209454, 'reg_lambda': 0.9980697985503726, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:52:03,370] Trial 40 finished with value: 329.2017891618069 and parameters: {'n_estimators': 91, 'learning_rate': 0.6222123945680287, 'colsample_bytree': 0.929384197113956, 'colsample_bynode': 0.952207421777972, 'colsample_bylevel': 0.8098775187878909, 'reg_lambda': 0.9926906502499317, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:52:07,768] Trial 42 finished with value: 329.17565920374847 and parameters: {'n_estimators': 89, 'learning_rate': 0.6249723984379937, 'colsample_bytree': 0.915905677979786, 'colsample_bynode': 0.9464253547390199, 'colsample_bylevel': 0.7867858837889996, 'reg_lambda': 0.8238080425851559, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 9}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:54:47,898] Trial 47 finished with value: 706.9687982220186 and parameters: {'n_estimators': 90, 'learning_rate': 0.6029988601894867, 'colsample_bytree': 0.9008183912373705, 'colsample_bynode': 0.9532761848667144, 'colsample_bylevel': 0.8577133050021863, 'reg_lambda': 0.8053846370684328, 'max_depth': 6, 'max_leaves': 1, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:54:48,547] Trial 44 finished with value: 706.8632317901731 and parameters: {'n_estimators': 75, 'learning_rate': 0.6141931244855309, 'colsample_bytree': 0.9148526654033013, 'colsample_bynode': 0.9451087052611272, 'colsample_bylevel': 0.7928643937784344, 'reg_lambda': 0.807709179221934, 'max_depth': 6, 'max_leaves': 1, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:54:52,254] Trial 43 finished with value: 329.7038555903672 and parameters: {'n_estimators': 90, 'learning_rate': 0.6119286961400929, 'colsample_bytree': 0.9120306974990947, 'colsample_bynode': 0.951272962272417, 'colsample_bylevel': 0.7797170113792572, 'reg_lambda': 0.8047648146882534, 'max_depth': 6, 'max_leaves': 0, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:55:39,433] Trial 45 finished with value: 706.9469383707167 and parameters: {'n_estimators': 90, 'learning_rate': 0.605140867012063, 'colsample_bytree': 0.9092213508608811, 'colsample_bynode': 0.9499553038992636, 'colsample_bylevel': 0.8562023542976394, 'reg_lambda': 0.8107447485305221, 'max_depth': 6, 'max_leaves': 1, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:56:47,544] Trial 46 finished with value: 706.7664285941967 and parameters: {'n_estimators': 91, 'learning_rate': 0.6269426025673392, 'colsample_bytree': 0.9092765139442558, 'colsample_bynode': 0.9495606210766792, 'colsample_bylevel': 0.8496793179407985, 'reg_lambda': 0.8104898145023496, 'max_depth': 6, 'max_leaves': 1, 'max_cat_to_onehot': 8}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:58:20,294] Trial 48 finished with value: 706.9832565193495 and parameters: {'n_estimators': 91, 'learning_rate': 0.6016254514349906, 'colsample_bytree': 0.8945464355623441, 'colsample_bynode': 0.9272452260354069, 'colsample_bylevel': 0.8482544059012241, 'reg_lambda': 0.8215660057995322, 'max_depth': 4, 'max_leaves': 1, 'max_cat_to_onehot': 7}. Best is trial 33 with value: 327.95215804440556.
[I 2023-03-07 17:58:52,103] Trial 49 finished with value: 706.9486029461505 and parameters: {'n_estimators': 90, 'learning_rate': 0.6049747681420412, 'colsample_bytree': 0.8229040978888252, 'colsample_bynode': 0.9110794523915683, 'colsample_bylevel': 0.8625972325012149, 'reg_lambda': 0.8122361434704258, 'max_depth': 4, 'max_leaves': 1, 'max_cat_to_onehot': 7}. Best is trial 33 with value: 327.95215804440556.
Total time:  1:25:10.974223
CPU times: user 23min 24s, sys: 7min 54s, total: 31min 19s
Wall time: 1h 25min 10s

Cluster cleanup

# Tear down running clusters
for client, _ in clusters.values():
    client.shutdown()

Results

len(study.trials)
50
study.best_params
{'n_estimators': 88,
 'learning_rate': 0.620483334826477,
 'colsample_bytree': 0.9924092433149426,
 'colsample_bynode': 0.9647169421797864,
 'colsample_bylevel': 0.8065679903678107,
 'reg_lambda': 0.9884315287794484,
 'max_depth': 6,
 'max_leaves': 0,
 'max_cat_to_onehot': 8}
study.best_value
327.95215804440556
import joblib

# Uncomment to save the results of your study to examine later
# joblib.dump(study, "study_many_threads.pickle")

study = joblib.load("study_many_threads.pickle")
import matplotlib.pyplot as plt

fig = optuna.visualization.matplotlib.plot_optimization_history(study)
fig.legend(loc="upper right")
/var/folders/1y/ydztfpnd11b6qmvbb8_x56jh0000gn/T/ipykernel_19277/186097083.py:1: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
  fig = optuna.visualization.matplotlib.plot_optimization_history(study)
../../../_images/8ecc0dd52824a2c41dfcb42ba740e14ef1cd4e910026916c1f20878dc05b6bf7.png