scikit-learn with Coiled Functions#
scikit-learn is a library for predictive data analysis. In this guide, you’ll learn how to leverage the built-in parallelism through training the model on a VM with many cores in the cloud with Coiled Serverless Functions.
Before you start#
You’ll first need to install the necessary packages. For the purposes of this example, we’ll do this in a new virtual environment, but you could also install them in whatever environment you’re already using for your project.
conda create -n coiled-sklearn-example -c conda-forge python=3.10 scikit-learn conda activate coiled-sklearn-example
You also could use
pip for everything, or any other package manager you prefer;
conda isn’t required.
When you run your code with Coiled Functions, Coiled will automatically replicate your local
coiled-sklearn-example environment to your cluster.
About the Data#
This example will use randomized data to enable users to easily reproduce our results.
About the query#
We will train a RandomForest classifier and return the trained model back to our machine.
Dispatch the computation to a VM in the cloud#
We have to decorate our function that will do the work with a Coiled specific decorator that will offload the computation.
import coiled @coiled.function( vm_type="c6i.32xlarge", # 256 GB of RAM keepalive="5 minutes", # keep alive to run multiple queries if necessary )
This will offload our workload to an EC2 instance with 128 cores. scikit-learn allows you to easily train your model on multiple cores, so we can benefit from a bigger machine.
Train the model#
We will offload the training step to a cloud-hosted VM.
import coiled from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification @coiled.function( vm_type="c6i.32xlarge", # 128 cores, compute optimized keepalive="5 minutes", # keep alive to train more models if necessary ) def train(): X, y = make_classification(n_samples=2_000_000, n_features=30, random_state=0, shuffle=False) clf = RandomForestClassifier(random_state=0, n_jobs=-1) clf.fit(X, y) return clf train()
This is where the actual decorator comes in. Coiled Functions will offload the computation to AWS and get back the models. We can use the model locally for inference after offloading the expensive training step. The provisioned instance will stay alive for 5 minutes so that we can reconnect if we have to run another query.