From 8768ccfd1a9f45563683299c6ca3d6638e0ef124 Mon Sep 17 00:00:00 2001
From: Amagash <tiffanysouterre@gmail.com>
Date: Mon, 30 Aug 2021 20:46:20 +0200
Subject: [PATCH] [Lesson 19] Add AutoML config part

---
 5-Data-Science-In-Cloud/19-tbd/README.md | 63 ++++++++++++++++++++++--
 1 file changed, 59 insertions(+), 4 deletions(-)

diff --git a/5-Data-Science-In-Cloud/19-tbd/README.md b/5-Data-Science-In-Cloud/19-tbd/README.md
index 94597c6..3921b60 100644
--- a/5-Data-Science-In-Cloud/19-tbd/README.md
+++ b/5-Data-Science-In-Cloud/19-tbd/README.md
@@ -11,8 +11,10 @@ Table of contents:
     - [2.1 Create an Azure ML workspace](#21-create-an-azure-ml-workspace)
     - [2.2 Create a compute instance](#22-create-a-compute-instance)
     - [2.3 Loading the Dataset](#23-loading-the-dataset)
-    - [2.3 Creating Notebooks](#23-creating-notebooks)
-    - [2.4 Training a model with the Azure ML SDK](#24-training-a-model-with-the-azure-ml-sdk)
+    - [2.4 Creating Notebooks](#24-creating-notebooks)
+    - [2.5 Training a model with the Azure ML SDK](#25-training-a-model-with-the-azure-ml-sdk)
+      - [2.5.1 Setup Workspace, experiment, compute cluster and dataset](#251-setup-workspace-experiment-compute-cluster-and-dataset)
+      - [2.5.2 AutoML Configuration](#252-automl-configuration)
   - [🚀 Challenge](#-challenge)
   - [Post-Lecture Quiz](#post-lecture-quiz)
   - [Review & Self Study](#review--self-study)
@@ -70,7 +72,7 @@ Congratulations, you have just created a compute instance! We will use this comp
 ### 2.3 Loading the Dataset
 Refer the [previous lesson](../18-tbd/README.md) in the section **2.3 Loading the Dataset** if you have not uploaded the dataset yet.
 
-### 2.3 Creating Notebooks
+### 2.4 Creating Notebooks
 Notebook are a really important part of the data science process. They can be used to Conduct Exploratory Data Analysis (EDA), call out to a computer cluster to train a model, call out to an inference cluster to deploy an endpoint. 
 
 To create a Notebook, we need a compute node that is serving out the jupyter notebook instance. Go back to the [Azure ML workspace](https://ml.azure.com/) and click on Compute instances. In the list of compute instances you should see the [compute instance we created earlier](#22-create-a-compute-instance). 
@@ -84,10 +86,12 @@ To create a Notebook, we need a compute node that is serving out the jupyter not
 
 Now that we have a Notebook, we can start training the model with Azure ML SDK.
 
-### 2.4 Training a model with the Azure ML SDK
+### 2.5 Training a model with the Azure ML SDK
 
 First of all, if you ever have a doubt, refer to the [Azure ML SDK documentation](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py). In contains all the necessary information to understand the modules we are going to see in this lesson.
 
+#### 2.5.1 Setup Workspace, experiment, compute cluster and dataset
+
 You need to load the `workspace` from the configuration file using the following code:
 
 ```python
@@ -104,6 +108,25 @@ experiment = Experiment(ws, experiment_name)
 ```
 To get or create an experiment from a workspace, you request the experiment using the experiment name. Experiment name must be 3-36 characters, start with a letter or a number, and can only contain letters, numbers, underscores, and dashes. If the experiment is not found in the workspace, a new experiment is created.
 
+Now you need to create a compute cluster for the training using the following code. Note that this step can take a few minutes. 
+
+```python
+from azureml.core.compute import ComputeTarget, AmlCompute
+
+aml_name = "heart-f-cluster"
+try:
+    aml_compute = AmlCompute(ws, aml_name)
+    print('Found existing AML compute context.')
+except:
+    print('Creating new AML compute context.')
+    aml_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_v2", min_nodes=1, max_nodes=3)
+    aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config)
+    aml_compute.wait_for_completion(show_output = True)
+
+cts = ws.compute_targets
+compute_target = cts[aml_name]
+```
+
 You can get the dataset from the workspace using the dataset name in the following way:
 
 ```python
@@ -111,6 +134,38 @@ dataset = ws.datasets['heart-failure-records']
 df = dataset.to_pandas_dataframe()
 df.describe()
 ```
+#### 2.5.2 AutoML Configuration
+
+To set the AutoML configuration, use the [AutoMLConfig class](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig(class)?view=azure-ml-py).
+
+As described in the doc, there are a lot of settings with which you can play with. For this project, we will use the following settings:
+
+- `experiment_timeout_minutes`: The maximum amount of time (in minutes) that the experiment is allowed to run before it is automatically stopped and results are automatically made available
+- `max_concurrent_iterations`: The maximum number of concurrent training iterations allowed for the experiment.
+- `primary_metric`: The primary metric used to determine the experiment's status.
+
+```python
+from azureml.train.automl import AutoMLConfig
+
+project_folder = './aml-project'
+
+automl_settings = {
+    "experiment_timeout_minutes": 20,
+    "max_concurrent_iterations": 3,
+    "primary_metric" : 'AUC_weighted'
+}
+
+automl_config = AutoMLConfig(compute_target=compute_target,
+                             task = "classification",
+                             training_data=dataset,
+                             label_column_name="DEATH_EVENT",
+                             path = project_folder,  
+                             enable_early_stopping= True,
+                             featurization= 'auto',
+                             debug_log = "automl_errors.log",
+                             **automl_settings
+                            )
+```
 
 ## 🚀 Challenge