From 8768ccfd1a9f45563683299c6ca3d6638e0ef124 Mon Sep 17 00:00:00 2001 From: Amagash Date: Mon, 30 Aug 2021 20:46:20 +0200 Subject: [PATCH] [Lesson 19] Add AutoML config part --- 5-Data-Science-In-Cloud/19-tbd/README.md | 63 ++++++++++++++++++++++-- 1 file changed, 59 insertions(+), 4 deletions(-) diff --git a/5-Data-Science-In-Cloud/19-tbd/README.md b/5-Data-Science-In-Cloud/19-tbd/README.md index 94597c6..3921b60 100644 --- a/5-Data-Science-In-Cloud/19-tbd/README.md +++ b/5-Data-Science-In-Cloud/19-tbd/README.md @@ -11,8 +11,10 @@ Table of contents: - [2.1 Create an Azure ML workspace](#21-create-an-azure-ml-workspace) - [2.2 Create a compute instance](#22-create-a-compute-instance) - [2.3 Loading the Dataset](#23-loading-the-dataset) - - [2.3 Creating Notebooks](#23-creating-notebooks) - - [2.4 Training a model with the Azure ML SDK](#24-training-a-model-with-the-azure-ml-sdk) + - [2.4 Creating Notebooks](#24-creating-notebooks) + - [2.5 Training a model with the Azure ML SDK](#25-training-a-model-with-the-azure-ml-sdk) + - [2.5.1 Setup Workspace, experiment, compute cluster and dataset](#251-setup-workspace-experiment-compute-cluster-and-dataset) + - [2.5.2 AutoML Configuration](#252-automl-configuration) - [🚀 Challenge](#-challenge) - [Post-Lecture Quiz](#post-lecture-quiz) - [Review & Self Study](#review--self-study) @@ -70,7 +72,7 @@ Congratulations, you have just created a compute instance! We will use this comp ### 2.3 Loading the Dataset Refer the [previous lesson](../18-tbd/README.md) in the section **2.3 Loading the Dataset** if you have not uploaded the dataset yet. -### 2.3 Creating Notebooks +### 2.4 Creating Notebooks Notebook are a really important part of the data science process. They can be used to Conduct Exploratory Data Analysis (EDA), call out to a computer cluster to train a model, call out to an inference cluster to deploy an endpoint. To create a Notebook, we need a compute node that is serving out the jupyter notebook instance. Go back to the [Azure ML workspace](https://ml.azure.com/) and click on Compute instances. In the list of compute instances you should see the [compute instance we created earlier](#22-create-a-compute-instance). @@ -84,10 +86,12 @@ To create a Notebook, we need a compute node that is serving out the jupyter not Now that we have a Notebook, we can start training the model with Azure ML SDK. -### 2.4 Training a model with the Azure ML SDK +### 2.5 Training a model with the Azure ML SDK First of all, if you ever have a doubt, refer to the [Azure ML SDK documentation](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py). In contains all the necessary information to understand the modules we are going to see in this lesson. +#### 2.5.1 Setup Workspace, experiment, compute cluster and dataset + You need to load the `workspace` from the configuration file using the following code: ```python @@ -104,6 +108,25 @@ experiment = Experiment(ws, experiment_name) ``` To get or create an experiment from a workspace, you request the experiment using the experiment name. Experiment name must be 3-36 characters, start with a letter or a number, and can only contain letters, numbers, underscores, and dashes. If the experiment is not found in the workspace, a new experiment is created. +Now you need to create a compute cluster for the training using the following code. Note that this step can take a few minutes. + +```python +from azureml.core.compute import ComputeTarget, AmlCompute + +aml_name = "heart-f-cluster" +try: + aml_compute = AmlCompute(ws, aml_name) + print('Found existing AML compute context.') +except: + print('Creating new AML compute context.') + aml_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_v2", min_nodes=1, max_nodes=3) + aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config) + aml_compute.wait_for_completion(show_output = True) + +cts = ws.compute_targets +compute_target = cts[aml_name] +``` + You can get the dataset from the workspace using the dataset name in the following way: ```python @@ -111,6 +134,38 @@ dataset = ws.datasets['heart-failure-records'] df = dataset.to_pandas_dataframe() df.describe() ``` +#### 2.5.2 AutoML Configuration + +To set the AutoML configuration, use the [AutoMLConfig class](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig(class)?view=azure-ml-py). + +As described in the doc, there are a lot of settings with which you can play with. For this project, we will use the following settings: + +- `experiment_timeout_minutes`: The maximum amount of time (in minutes) that the experiment is allowed to run before it is automatically stopped and results are automatically made available +- `max_concurrent_iterations`: The maximum number of concurrent training iterations allowed for the experiment. +- `primary_metric`: The primary metric used to determine the experiment's status. + +```python +from azureml.train.automl import AutoMLConfig + +project_folder = './aml-project' + +automl_settings = { + "experiment_timeout_minutes": 20, + "max_concurrent_iterations": 3, + "primary_metric" : 'AUC_weighted' +} + +automl_config = AutoMLConfig(compute_target=compute_target, + task = "classification", + training_data=dataset, + label_column_name="DEATH_EVENT", + path = project_folder, + enable_early_stopping= True, + featurization= 'auto', + debug_log = "automl_errors.log", + **automl_settings + ) +``` ## 🚀 Challenge