[Lesson 19] Add AutoML config part

pull/60/head
Amagash 3 years ago
parent 5daef3ca48
commit 8768ccfd1a

@ -11,8 +11,10 @@ Table of contents:
- [2.1 Create an Azure ML workspace](#21-create-an-azure-ml-workspace) - [2.1 Create an Azure ML workspace](#21-create-an-azure-ml-workspace)
- [2.2 Create a compute instance](#22-create-a-compute-instance) - [2.2 Create a compute instance](#22-create-a-compute-instance)
- [2.3 Loading the Dataset](#23-loading-the-dataset) - [2.3 Loading the Dataset](#23-loading-the-dataset)
- [2.3 Creating Notebooks](#23-creating-notebooks) - [2.4 Creating Notebooks](#24-creating-notebooks)
- [2.4 Training a model with the Azure ML SDK](#24-training-a-model-with-the-azure-ml-sdk) - [2.5 Training a model with the Azure ML SDK](#25-training-a-model-with-the-azure-ml-sdk)
- [2.5.1 Setup Workspace, experiment, compute cluster and dataset](#251-setup-workspace-experiment-compute-cluster-and-dataset)
- [2.5.2 AutoML Configuration](#252-automl-configuration)
- [🚀 Challenge](#-challenge) - [🚀 Challenge](#-challenge)
- [Post-Lecture Quiz](#post-lecture-quiz) - [Post-Lecture Quiz](#post-lecture-quiz)
- [Review & Self Study](#review--self-study) - [Review & Self Study](#review--self-study)
@ -70,7 +72,7 @@ Congratulations, you have just created a compute instance! We will use this comp
### 2.3 Loading the Dataset ### 2.3 Loading the Dataset
Refer the [previous lesson](../18-tbd/README.md) in the section **2.3 Loading the Dataset** if you have not uploaded the dataset yet. Refer the [previous lesson](../18-tbd/README.md) in the section **2.3 Loading the Dataset** if you have not uploaded the dataset yet.
### 2.3 Creating Notebooks ### 2.4 Creating Notebooks
Notebook are a really important part of the data science process. They can be used to Conduct Exploratory Data Analysis (EDA), call out to a computer cluster to train a model, call out to an inference cluster to deploy an endpoint. Notebook are a really important part of the data science process. They can be used to Conduct Exploratory Data Analysis (EDA), call out to a computer cluster to train a model, call out to an inference cluster to deploy an endpoint.
To create a Notebook, we need a compute node that is serving out the jupyter notebook instance. Go back to the [Azure ML workspace](https://ml.azure.com/) and click on Compute instances. In the list of compute instances you should see the [compute instance we created earlier](#22-create-a-compute-instance). To create a Notebook, we need a compute node that is serving out the jupyter notebook instance. Go back to the [Azure ML workspace](https://ml.azure.com/) and click on Compute instances. In the list of compute instances you should see the [compute instance we created earlier](#22-create-a-compute-instance).
@ -84,10 +86,12 @@ To create a Notebook, we need a compute node that is serving out the jupyter not
Now that we have a Notebook, we can start training the model with Azure ML SDK. Now that we have a Notebook, we can start training the model with Azure ML SDK.
### 2.4 Training a model with the Azure ML SDK ### 2.5 Training a model with the Azure ML SDK
First of all, if you ever have a doubt, refer to the [Azure ML SDK documentation](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py). In contains all the necessary information to understand the modules we are going to see in this lesson. First of all, if you ever have a doubt, refer to the [Azure ML SDK documentation](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py). In contains all the necessary information to understand the modules we are going to see in this lesson.
#### 2.5.1 Setup Workspace, experiment, compute cluster and dataset
You need to load the `workspace` from the configuration file using the following code: You need to load the `workspace` from the configuration file using the following code:
```python ```python
@ -104,6 +108,25 @@ experiment = Experiment(ws, experiment_name)
``` ```
To get or create an experiment from a workspace, you request the experiment using the experiment name. Experiment name must be 3-36 characters, start with a letter or a number, and can only contain letters, numbers, underscores, and dashes. If the experiment is not found in the workspace, a new experiment is created. To get or create an experiment from a workspace, you request the experiment using the experiment name. Experiment name must be 3-36 characters, start with a letter or a number, and can only contain letters, numbers, underscores, and dashes. If the experiment is not found in the workspace, a new experiment is created.
Now you need to create a compute cluster for the training using the following code. Note that this step can take a few minutes.
```python
from azureml.core.compute import ComputeTarget, AmlCompute
aml_name = "heart-f-cluster"
try:
aml_compute = AmlCompute(ws, aml_name)
print('Found existing AML compute context.')
except:
print('Creating new AML compute context.')
aml_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_v2", min_nodes=1, max_nodes=3)
aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config)
aml_compute.wait_for_completion(show_output = True)
cts = ws.compute_targets
compute_target = cts[aml_name]
```
You can get the dataset from the workspace using the dataset name in the following way: You can get the dataset from the workspace using the dataset name in the following way:
```python ```python
@ -111,6 +134,38 @@ dataset = ws.datasets['heart-failure-records']
df = dataset.to_pandas_dataframe() df = dataset.to_pandas_dataframe()
df.describe() df.describe()
``` ```
#### 2.5.2 AutoML Configuration
To set the AutoML configuration, use the [AutoMLConfig class](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig(class)?view=azure-ml-py).
As described in the doc, there are a lot of settings with which you can play with. For this project, we will use the following settings:
- `experiment_timeout_minutes`: The maximum amount of time (in minutes) that the experiment is allowed to run before it is automatically stopped and results are automatically made available
- `max_concurrent_iterations`: The maximum number of concurrent training iterations allowed for the experiment.
- `primary_metric`: The primary metric used to determine the experiment's status.
```python
from azureml.train.automl import AutoMLConfig
project_folder = './aml-project'
automl_settings = {
"experiment_timeout_minutes": 20,
"max_concurrent_iterations": 3,
"primary_metric" : 'AUC_weighted'
}
automl_config = AutoMLConfig(compute_target=compute_target,
task = "classification",
training_data=dataset,
label_column_name="DEATH_EVENT",
path = project_folder,
enable_early_stopping= True,
featurization= 'auto',
debug_log = "automl_errors.log",
**automl_settings
)
```
## 🚀 Challenge ## 🚀 Challenge

Loading…
Cancel
Save