[Lesson 19] Add AutoML config part

3 years ago · 8768ccfd1a
parent 5daef3ca48
commit 8768ccfd1a
1 changed files with 59 additions and 4 deletions
--- a/5-Data-Science-In-Cloud/19-tbd/README.md
+++ b/5-Data-Science-In-Cloud/19-tbd/README.md
@ -11,8 +11,10 @@ Table of contents:
    - [2.1 Create an Azure ML workspace](#21-create-an-azure-ml-workspace)
    - [2.2 Create a compute instance](#22-create-a-compute-instance)
    - [2.3 Loading the Dataset](#23-loading-the-dataset)
-    - [2.3 Creating Notebooks](#23-creating-notebooks)
+    - [2.4 Creating Notebooks](#24-creating-notebooks)
-    - [2.4 Training a model with the Azure ML SDK](#24-training-a-model-with-the-azure-ml-sdk)
+    - [2.5 Training a model with the Azure ML SDK](#25-training-a-model-with-the-azure-ml-sdk)
      - [2.5.1 Setup Workspace, experiment, compute cluster and dataset](#251-setup-workspace-experiment-compute-cluster-and-dataset)
      - [2.5.2 AutoML Configuration](#252-automl-configuration)
  - [🚀 Challenge](#-challenge)
  - [Post-Lecture Quiz](#post-lecture-quiz)
  - [Review & Self Study](#review--self-study)
@ -70,7 +72,7 @@ Congratulations, you have just created a compute instance! We will use this comp
 ### 2.3 Loading the Dataset
 Refer the [previous lesson](../18-tbd/README.md) in the section **2.3 Loading the Dataset** if you have not uploaded the dataset yet.
-### 2.3 Creating Notebooks
+### 2.4 Creating Notebooks
 Notebook are a really important part of the data science process. They can be used to Conduct Exploratory Data Analysis (EDA), call out to a computer cluster to train a model, call out to an inference cluster to deploy an endpoint. 
 To create a Notebook, we need a compute node that is serving out the jupyter notebook instance. Go back to the [Azure ML workspace](https://ml.azure.com/) and click on Compute instances. In the list of compute instances you should see the [compute instance we created earlier](#22-create-a-compute-instance). 
@ -84,10 +86,12 @@ To create a Notebook, we need a compute node that is serving out the jupyter not
 Now that we have a Notebook, we can start training the model with Azure ML SDK.
-### 2.4 Training a model with the Azure ML SDK
+### 2.5 Training a model with the Azure ML SDK
 First of all, if you ever have a doubt, refer to the [Azure ML SDK documentation](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py). In contains all the necessary information to understand the modules we are going to see in this lesson.
 #### 2.5.1 Setup Workspace, experiment, compute cluster and dataset
 You need to load the `workspace` from the configuration file using the following code:
 ```python
@ -104,6 +108,25 @@ experiment = Experiment(ws, experiment_name)
 ```
 To get or create an experiment from a workspace, you request the experiment using the experiment name. Experiment name must be 3-36 characters, start with a letter or a number, and can only contain letters, numbers, underscores, and dashes. If the experiment is not found in the workspace, a new experiment is created.
 Now you need to create a compute cluster for the training using the following code. Note that this step can take a few minutes. 
 ```python
 from azureml.core.compute import ComputeTarget, AmlCompute
 aml_name = "heart-f-cluster"
 try:
    aml_compute = AmlCompute(ws, aml_name)
    print('Found existing AML compute context.')
 except:
    print('Creating new AML compute context.')
    aml_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_v2", min_nodes=1, max_nodes=3)
    aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config)
    aml_compute.wait_for_completion(show_output = True)
 cts = ws.compute_targets
 compute_target = cts[aml_name]
 ```
 You can get the dataset from the workspace using the dataset name in the following way:
 ```python
@ -111,6 +134,38 @@ dataset = ws.datasets['heart-failure-records']
 df = dataset.to_pandas_dataframe()
 df.describe()
 ```
 #### 2.5.2 AutoML Configuration
 To set the AutoML configuration, use the [AutoMLConfig class](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig(class)?view=azure-ml-py).
 As described in the doc, there are a lot of settings with which you can play with. For this project, we will use the following settings:
 - `experiment_timeout_minutes`: The maximum amount of time (in minutes) that the experiment is allowed to run before it is automatically stopped and results are automatically made available
 - `max_concurrent_iterations`: The maximum number of concurrent training iterations allowed for the experiment.
 - `primary_metric`: The primary metric used to determine the experiment's status.
 ```python
 from azureml.train.automl import AutoMLConfig
 project_folder = './aml-project'
 automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 3,
    "primary_metric" : 'AUC_weighted'
 }
 automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="DEATH_EVENT",
                             path = project_folder,  
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             **automl_settings
                            )
 ```
 ## 🚀 Challenge