classifier 3!

pull/34/head
Jen Looper 3 years ago
parent 8db62fff9c
commit ea8b3193ff

@ -10,7 +10,7 @@ In this section of the curriculum, you will be introduced to the base concepts u
### Credits
"Introduction to Machine Learning" was written with ♥️ by a team of folks including ...
"Introduction to Machine Learning" was written with ♥️ by a team of folks including Muhammad Sakib Khan Inan, [Ornella Altunyan](https://twitter.com/ornelladotcom) and [Jen Looper](https://twitter.com/jenlooper)
"The History of Machine Learning" was written with ♥️ by [Jen Looper](https://twitter.com/jenlooper) and [Amy Boyd](https://twitter.com/AmyKateNicho)

@ -235,74 +235,6 @@
"y_pred = model.predict(X_test)\r\n",
"print(classification_report(y_test,y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Try different classifiers"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"\r\n",
"C = 10\r\n",
"# Create different classifiers.\r\n",
"classifiers = {\r\n",
" 'L1 logistic': LogisticRegression(C=C, penalty='l1',\r\n",
" solver='saga',\r\n",
" multi_class='multinomial',\r\n",
" max_iter=10000),\r\n",
" 'L2 logistic (Multinomial)': LogisticRegression(C=C, penalty='l2',\r\n",
" solver='saga',\r\n",
" multi_class='multinomial',\r\n",
" max_iter=10000),\r\n",
" 'L2 logistic (OvR)': LogisticRegression(C=C, penalty='l2',\r\n",
" solver='saga',\r\n",
" multi_class='ovr',\r\n",
" max_iter=10000),\r\n",
" 'Linear SVC': SVC(kernel='linear', C=C, probability=True,\r\n",
" random_state=0)\r\n",
"}\r\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Accuracy (train) for L1 logistic: 78.0% \n",
"Accuracy (train) for L2 logistic (Multinomial): 78.5% \n",
"Accuracy (train) for L2 logistic (OvR): 78.7% \n",
"Accuracy (train) for Linear SVC: 78.4% \n"
]
}
],
"source": [
"n_classifiers = len(classifiers)\r\n",
"\r\n",
"for index, (name, classifier) in enumerate(classifiers.items()):\r\n",
" classifier.fit(X_train, np.ravel(y_train))\r\n",
"\r\n",
" y_pred = classifier.predict(X_test)\r\n",
" accuracy = accuracy_score(y_test, y_pred)\r\n",
" print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\r\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

@ -1,67 +1,198 @@
# Recipe Classifiers 2
In this second Classification lesson, you will explore more ways to classify data, and the ramifications for choosing one over the other.
In this second Classification lesson, you will explore more ways to classify numeric data, and the ramifications for choosing one over the other.
## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/21/)
Scikit-Learn offers a similar, but more granular cheat sheet that can further help narrow down your estimators (another term for classifiers):
### Prerequisite
We assume that you have completed the previous lessons and have a cleaned dataset in your `data` folder called `cleaned_cuisine.csv` in the root of this 4-lesson folder.
### Preparation
We have loaded your `notebook.ipynb` file with the cleaned dataset and have divided it into X and y dataframes, ready for the model building process.
## A Classification Map
Previously, you learned about the various options you have when classifying data using Microsoft's cheat sheet. Scikit-Learn offers a similar, but more granular cheat sheet that can further help narrow down your estimators (another term for classifiers):
![ML Map from Scikit-Learn](images/map.png)
> Tip: [visit this map online](https://scikit-learn.org/stable/tutorial/machine_learning_map/) and click along the path to read documentation.
This map is very helpful as you can 'walk' along its paths to a decision:
This map is very helpful once you have a clear grasp of your data, as you can 'walk' along its paths to a decision:
- We have >50 samples
- We want to predict a category
- We have labeled data
- We have fewer than 100K samples
- We can choose a Linear SVC
- We can choose a Linear SVC
- If that doesn't work, since we have numeric data
- We can try a KNeighbors Classifier and if that doesn't work, try SVC and Ensemble Classifiers
- We can try a ✨ KNeighbors Classifier
- If that doesn't work, try ✨ SVC and ✨ Ensemble Classifiers
This is a terrific trail to try. Following this path, we should start by importing some libraries to use:
```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
import numpy as np
```
Split your training and test data:
This is a terrific trail to try. For our first foray, explore how well
```python
X_train, X_test, y_train, y_test = train_test_split(recipes_feature_df, recipes_label_df, test_size=0.3)
```
## Linear SVC Classifier
### Introduction
Start by creating an array of classifiers. You will add progressively to this array as we test. Start with a Linear SVC:
Describe what will be covered
```python
C = 10
# Create different classifiers.
classifiers = {
'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0)
}
```
Train your model using the Linear SVC and print out a report:
> Notes
```python
n_classifiers = len(classifiers)
### Prerequisite
for index, (name, classifier) in enumerate(classifiers.items()):
classifier.fit(X_train, np.ravel(y_train))
What steps should have been covered before this lesson?
y_pred = classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s: %0.1f%% " % (name, accuracy * 100))
print(classification_report(y_test,y_pred))
```
The result is pretty good:
### Preparation
```
Accuracy (train) for Linear SVC: 78.6%
precision recall f1-score support
chinese 0.71 0.67 0.69 242
indian 0.88 0.86 0.87 234
japanese 0.79 0.74 0.76 254
korean 0.85 0.81 0.83 242
thai 0.71 0.86 0.78 227
accuracy 0.79 1199
macro avg 0.79 0.79 0.79 1199
weighted avg 0.79 0.79 0.79 1199
```
✅ Learn about Linear SVC
Preparatory steps to start this lesson
Support-Vector Clustering (SVC) is a child of the Support-Vector machines family of ML techniques (learn more about these below). In this method, you can choose a 'kernel' to decide how to cluster the labels. The 'C' parameter refers to 'regularization' which regulates the influence of parameters. The kernel can be one of [several](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC); here we set it to 'linear' to ensure that we leverage Linear SVC. Probability defaults to 'false'; here we set it to 'true' to gather probability estimates. We set the random state to '0' to shuffle the data to get probabilities.
## K-Neighbors Classifier
---
The previous classifier was good, and worked well with the data, but maybe we can get better accuracy. Try a K-Neighbors Classifer. Add a line to your classifier array (add a comma after the Linear SVC item):
[Step through content in blocks]
```python
'KNN classifier': KNeighborsClassifier(C),
```
The result is a little worse:
## [Topic 1]
```
Accuracy (train) for KNN classifier: 73.8%
precision recall f1-score support
chinese 0.64 0.67 0.66 242
indian 0.86 0.78 0.82 234
japanese 0.66 0.83 0.74 254
korean 0.94 0.58 0.72 242
thai 0.71 0.82 0.76 227
accuracy 0.74 1199
macro avg 0.76 0.74 0.74 1199
weighted avg 0.76 0.74 0.74 1199
```
✅ Learn about [K-Neighbors](https://scikit-learn.org/stable/modules/neighbors.html#neighbors)
### Task:
K-Neighbors is part of the "neighbors" family of ML methods, which can be used for both supervised and unsupervised learning. In this method, a predefined number of points is created and data are gathered around these points such that generalized labels can be predicted for the data.
## Support Vector Classifier
Work together to progressively enhance your codebase to build the project with shared code:
Let's try for a little better accuracy with a Support Vector Classifier. Add a comma after the K-Neighbors item, and then add this line:
```html
code blocks
```python
'SVC': SVC(),
```
The result is quite good!
✅ Knowledge Check - use this moment to stretch students' knowledge with open questions
```
Accuracy (train) for SVC: 83.2%
precision recall f1-score support
chinese 0.79 0.74 0.76 242
indian 0.88 0.90 0.89 234
japanese 0.87 0.81 0.84 254
korean 0.91 0.82 0.86 242
thai 0.74 0.90 0.81 227
accuracy 0.83 1199
macro avg 0.84 0.83 0.83 1199
weighted avg 0.84 0.83 0.83 1199
```
## [Topic 2]
✅ Learn about [Support-Vectors](https://scikit-learn.org/stable/modules/svm.html#svm)
## [Topic 3]
Support-Vector Classifiers are part of the [Support-Vector Machine](https://en.wikipedia.org/wiki/Support-vector_machine) family of ML methods that are used for classification and regression tasks. SVMs "map training examples to points in space" to maximize the distance between two categories. Subsequent data is mapped into this space so their category can be predicted.
## Ensemble Classifiers
## 🚀Challenge
Let's follow the path to the very end, even though the previous test was quite good. Let's try some 'Ensemble Classifiers, specifically Random Forest and AdaBoost:
Add a challenge for students to work on collaboratively in class to enhance the project
```
'RFST': RandomForestClassifier(n_estimators=100),
'ADA': AdaBoostClassifier(n_estimators=100)
```
The result is very good, especially for Random Forest:
```
Accuracy (train) for RFST: 84.5%
precision recall f1-score support
chinese 0.80 0.77 0.78 242
indian 0.89 0.92 0.90 234
japanese 0.86 0.84 0.85 254
korean 0.88 0.83 0.85 242
thai 0.80 0.87 0.83 227
accuracy 0.84 1199
macro avg 0.85 0.85 0.84 1199
weighted avg 0.85 0.84 0.84 1199
Accuracy (train) for ADA: 72.4%
precision recall f1-score support
chinese 0.64 0.49 0.56 242
indian 0.91 0.83 0.87 234
japanese 0.68 0.69 0.69 254
korean 0.73 0.79 0.76 242
thai 0.67 0.83 0.74 227
accuracy 0.72 1199
macro avg 0.73 0.73 0.72 1199
weighted avg 0.73 0.72 0.72 1199
```
✅ Learn about [Ensemble Classifiers](https://scikit-learn.org/stable/modules/ensemble.html)
This method of Machine Learning "combines the predictions of several base estimators" to improve the model's quality. In our example, we used Random Trees and AdaBoost.
- [Random Forest](https://scikit-learn.org/stable/modules/ensemble.html#forest), an averaging method, builds a 'forest' of 'decision trees' infused with randomness to avoid overfitting. The n_estimators parameter is set to the number of trees.
- [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) fits a classifier to a dataset and then fits copies of that classifier to the same dataset. It focuses on the weights of incorrectly classified items and adjusts the fit for the next classifier to correct.
## 🚀Challenge
Optional: add a screenshot of the completed lesson's UI if appropriate
Each of these techniques has a large number of parameters that you can tweak. Research each one's default parameters and think about what tweaking these parameters would mean for the model's quality.
## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/22/)
## Review & Self Study
There's a lot of jargon in these lessons, so take a minute to review [this list](https://docs.microsoft.com/en-us/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-15963-cxa) of useful terminology!
## Assignment
[Assignment Name](assignment.md)
[Parameter Play](assignment.md)

@ -1,9 +1,11 @@
# [Assignment Name]
# Parameter Play
## Instructions
There are a lot of parameters that are set by default when working with these classifiers. Intellisense in VS Code can help you dig into them. Adopt one of the ML Classification Techniques in this lesson and retrain models tweaking various parameter values. Build a notebook explaining why some changes help the model quality while others degrade it. Be detailed in your answer.
## Rubric
| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | --------- | -------- | ----------------- |
| | | | |
| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | ---------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | ----------------------------- |
| | A notebook is presented with a classifier fully built up and its parameters tweaked and changes explained in textboxes | A notebook is partially presented or poorly explained | A notebook is buggy or flawed |

@ -1,15 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Build Classification Model"
]
"# Build More Classification Models"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 57,
"metadata": {},
"outputs": [
{
@ -42,7 +42,7 @@
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Unnamed: 0</th>\n <th>cuisine</th>\n <th>almond</th>\n <th>angelica</th>\n <th>anise</th>\n <th>anise_seed</th>\n <th>apple</th>\n <th>apple_brandy</th>\n <th>apricot</th>\n <th>armagnac</th>\n <th>...</th>\n <th>whiskey</th>\n <th>white_bread</th>\n <th>white_wine</th>\n <th>whole_grain_wheat_flour</th>\n <th>wine</th>\n <th>wood</th>\n <th>yam</th>\n <th>yeast</th>\n <th>yogurt</th>\n <th>zucchini</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>indian</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>indian</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>2</td>\n <td>indian</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>3</td>\n <td>indian</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>4</td>\n <td>indian</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows × 382 columns</p>\n</div>"
},
"metadata": {},
"execution_count": 15
"execution_count": 57
}
],
"source": [
@ -53,7 +53,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 58,
"metadata": {},
"outputs": [
{
@ -69,7 +69,7 @@
]
},
"metadata": {},
"execution_count": 16
"execution_count": 58
}
],
"source": [
@ -79,7 +79,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 59,
"metadata": {},
"outputs": [
{
@ -112,7 +112,7 @@
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>almond</th>\n <th>angelica</th>\n <th>anise</th>\n <th>anise_seed</th>\n <th>apple</th>\n <th>apple_brandy</th>\n <th>apricot</th>\n <th>armagnac</th>\n <th>artemisia</th>\n <th>artichoke</th>\n <th>...</th>\n <th>whiskey</th>\n <th>white_bread</th>\n <th>white_wine</th>\n <th>whole_grain_wheat_flour</th>\n <th>wine</th>\n <th>wood</th>\n <th>yam</th>\n <th>yeast</th>\n <th>yogurt</th>\n <th>zucchini</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows × 380 columns</p>\n</div>"
},
"metadata": {},
"execution_count": 17
"execution_count": 59
}
],
"source": [
@ -120,14 +120,23 @@
"recipes_feature_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Try different classifiers"
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neighbors import KNeighborsClassifier\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.svm import SVC\n",
"from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier\n",
"from sklearn.model_selection import train_test_split, cross_val_score\n",
"from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\n",
"import numpy as np"
@ -135,72 +144,119 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(recipes_feature_df, recipes_label_df, test_size=0.3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Try different classifiers"
]
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"\r\n",
"C = 10\r\n",
"# Create different classifiers.\r\n",
"classifiers = {\r\n",
" 'L1 logistic': LogisticRegression(C=C, penalty='l1',\r\n",
" solver='saga',\r\n",
" multi_class='multinomial',\r\n",
" max_iter=10000),\r\n",
" 'L2 logistic (Multinomial)': LogisticRegression(C=C, penalty='l2',\r\n",
" solver='saga',\r\n",
" multi_class='multinomial',\r\n",
" max_iter=10000),\r\n",
" 'L2 logistic (OvR)': LogisticRegression(C=C, penalty='l2',\r\n",
" solver='saga',\r\n",
" multi_class='ovr',\r\n",
" max_iter=10000),\r\n",
" 'Linear SVC': SVC(kernel='linear', C=C, probability=True,\r\n",
" random_state=0)\r\n",
"}\r\n"
"\n",
"C = 10\n",
"# Create different classifiers.\n",
"classifiers = {\n",
" 'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0),\n",
" 'KNN classifier': KNeighborsClassifier(C),\n",
" 'SVC': SVC(),\n",
" 'RFST': RandomForestClassifier(n_estimators=100),\n",
" 'ADA': AdaBoostClassifier(n_estimators=100)\n",
" \n",
"}\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 63,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Accuracy (train) for L1 logistic: 79.9% \n",
"Accuracy (train) for L2 logistic (Multinomial): 80.6% \n",
"Accuracy (train) for L2 logistic (OvR): 81.2% \n",
"Accuracy (train) for Linear SVC: 78.9% \n"
"Accuracy (train) for Linear SVC: 79.7% \n",
" precision recall f1-score support\n",
"\n",
" chinese 0.71 0.73 0.72 232\n",
" indian 0.91 0.88 0.89 251\n",
" japanese 0.76 0.78 0.77 239\n",
" korean 0.86 0.75 0.80 244\n",
" thai 0.76 0.84 0.80 233\n",
"\n",
" accuracy 0.80 1199\n",
" macro avg 0.80 0.80 0.80 1199\n",
"weighted avg 0.80 0.80 0.80 1199\n",
"\n",
"Accuracy (train) for KNN classifier: 72.9% \n",
" precision recall f1-score support\n",
"\n",
" chinese 0.62 0.68 0.65 232\n",
" indian 0.87 0.82 0.85 251\n",
" japanese 0.62 0.83 0.71 239\n",
" korean 0.92 0.55 0.68 244\n",
" thai 0.73 0.76 0.75 233\n",
"\n",
" accuracy 0.73 1199\n",
" macro avg 0.75 0.73 0.73 1199\n",
"weighted avg 0.76 0.73 0.73 1199\n",
"\n",
"Accuracy (train) for SVC: 81.8% \n",
" precision recall f1-score support\n",
"\n",
" chinese 0.78 0.71 0.74 232\n",
" indian 0.92 0.90 0.91 251\n",
" japanese 0.79 0.80 0.80 239\n",
" korean 0.85 0.78 0.82 244\n",
" thai 0.75 0.89 0.81 233\n",
"\n",
" accuracy 0.82 1199\n",
" macro avg 0.82 0.82 0.82 1199\n",
"weighted avg 0.82 0.82 0.82 1199\n",
"\n",
"Accuracy (train) for RFST: 83.3% \n",
" precision recall f1-score support\n",
"\n",
" chinese 0.80 0.75 0.77 232\n",
" indian 0.91 0.92 0.91 251\n",
" japanese 0.81 0.82 0.82 239\n",
" korean 0.85 0.81 0.83 244\n",
" thai 0.79 0.86 0.82 233\n",
"\n",
" accuracy 0.83 1199\n",
" macro avg 0.83 0.83 0.83 1199\n",
"weighted avg 0.83 0.83 0.83 1199\n",
"\n",
"Accuracy (train) for ADA: 70.8% \n",
" precision recall f1-score support\n",
"\n",
" chinese 0.60 0.45 0.51 232\n",
" indian 0.90 0.81 0.85 251\n",
" japanese 0.65 0.72 0.68 239\n",
" korean 0.72 0.76 0.74 244\n",
" thai 0.67 0.79 0.72 233\n",
"\n",
" accuracy 0.71 1199\n",
" macro avg 0.71 0.71 0.70 1199\n",
"weighted avg 0.71 0.71 0.70 1199\n",
"\n"
]
}
],
"source": [
"n_classifiers = len(classifiers)\r\n",
"\r\n",
"for index, (name, classifier) in enumerate(classifiers.items()):\r\n",
" classifier.fit(X_train, np.ravel(y_train))\r\n",
"\r\n",
" y_pred = classifier.predict(X_test)\r\n",
" accuracy = accuracy_score(y_test, y_pred)\r\n",
" print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\r\n"
"n_classifiers = len(classifiers)\n",
"\n",
"for index, (name, classifier) in enumerate(classifiers.items()):\n",
" classifier.fit(X_train, np.ravel(y_train))\n",
"\n",
" y_pred = classifier.predict(X_test)\n",
" accuracy = accuracy_score(y_test, y_pred)\n",
" print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\n",
" print(classification_report(y_test,y_pred))"
]
},
{
@ -216,8 +272,9 @@
"hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3.7.0 64-bit ('3.7')"
"name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7",
"display_name": "Python 3.7.3 64-bit",
"language": "python"
},
"language_info": {
"codemirror_mode": {

@ -6,4 +6,4 @@ In this section of the curriculum, you will be introduced to some real-world app
1. [Real-World Applications for ML](1-Applications/README.md)
## Credits
"Real-World Applications" was written by a team of folks, including [Jen Looper](https://twitter.com/jenlooper), [Ornella Altunyan](https://twitter.com/ornelladotcom) and ...
"Real-World Applications" was written by a team of folks, including [Jen Looper](https://twitter.com/jenlooper) and [Ornella Altunyan](https://twitter.com/ornelladotcom).

@ -16,9 +16,11 @@ Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson cur
Travel with us around the world as we apply these classic techniques to data from many areas of the world. Each lesson includes pre- and post-lesson quizzes, written instructions to complete the lesson, a solution, an assignment and more. Our project-based pedagogy allows you to learn while building, a proven way for new skills to 'stick'.
**Hearty thanks to our authors** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshkinov, Amy Boyd, Ornella Altunyan
**✍️ Hearty thanks to our authors** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshkinov, Ornella Altunyan, Amy Boyd
**🙏Special thanks🙏 to our Microsoft Student Ambassador reviewers and content contributors**, notably Rishit Dagli, Rohan Raj, Muhammad Sakib Khan Inan, Alexandru Petrescu, Abhishek Jaiswal, Nawrin Tabassum, Ioan Samuila, and Snigdha Agarwal
**🎨 Thanks as well to our illustrators** Tomomi Imura, Dasani Madipalli, and Jen Looper
**🙏 Special thanks 🙏 to our Microsoft Student Ambassador reviewers and content contributors**, notably Rishit Dagli, Muhammad Sakib Khan Inan, Rohan Raj, Alexandru Petrescu, Abhishek Jaiswal, Nawrin Tabassum, Ioan Samuila, and Snigdha Agarwal
# Getting Started
@ -69,7 +71,7 @@ By ensuring that the content aligns with projects, the process is made more enga
| Lesson Number | Section | Concepts Taught | Learning Objectives | Linked Lesson | Author |
| :-----------: | :--------------------------------------------------------: | :-------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------: | :---------: |
| 01 | [Introduction](1-Introduction/README.md) | Introduction to Machine Learning | Learn the basic concepts behind Machine Learning | [lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
| 01 | [Introduction](1-Introduction/README.md) | Introduction to Machine Learning | Learn the basic concepts behind Machine Learning | [lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
| 02 | [Introduction](1-Introduction/README.md) | The History of Machine Learning | Learn the history underlying this field | [lesson](Introduction/2-history-of-ML/README.md) | Jen and Amy |
| 03 | [Introduction](1-Introduction/README.md) | Fairness and Machine Learning | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [lesson](1-Introduction/3-fairness/README.md) | Tomomi |
| 04 | Introduction to Regression | [Regression](2-Regression/README.md) | Get started with Python and Scikit-Learn for Regression models | [lesson](2-Regression/1-Tools/README.md) | Jen |
@ -78,9 +80,9 @@ By ensuring that the content aligns with projects, the process is made more enga
| 07 | North American Pumpkin Prices 🎃 | [Regression](2-Regression/README.md) | Build a Logistic Regression model | [lesson](2-Regression/4-Logistic/README.md) | Jen |
| 08 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a Web app to use your trained model | [lesson](3-Web-App/README.md) | Jen |
| 09 | Introduction to Classification | [Classification](4-Classification/README.md) | Clean, Prep, and Visualize your Data; Introduction to Classification | [lesson](4-Classification/1-Introduction/README.md) | Cassie |
| 10 | Delicious Asian and Indian Recipes 🍜 | [Classification](4-Classification/README.md) | Build a Discriminative Model | [lesson](4-Classification/2-Descriminative/README.md) | Cassie |
| 11 | Delicious Asian and Indian Recipes 🍜 | [Classification](4-Classification/README.md) | Build a Generative Model | [lesson](4-Classification/3-Generative/README.md) | Cassie |
| 12 | Delicious Asian and Indian Recipes 🍜 | [Classification](4-Classification/README.md) | Build a Web App using your Model | [lesson](4-Classification/4-Applied/README.md) | Jen |
| 10 | Delicious Asian and Indian Recipes 🍜 | [Classification](4-Classification/README.md) | Build a Discriminative Model | [lesson](4-Classification/2-Descriminative/README.md) | Cassie |
| 11 | Delicious Asian and Indian Recipes 🍜 | [Classification](4-Classification/README.md) | Build a Generative Model | [lesson](4-Classification/3-Generative/README.md) | Cassie |
| 12 | Delicious Asian and Indian Recipes 🍜 | [Classification](4-Classification/README.md) | Build a Web App using your Model | [lesson](4-Classification/4-Applied/README.md) | Jen |
| 13 | Introduction to Clustering | [Clustering](5-Clustering/README.md) | Clean, Prep, and Visualize your Data; Introduction to Clustering | [lesson](5-Clustering/1-Visualize/README.md) | Jen |
| 14 | Exploring Nigerian Musical Tastes 🎧 | [Clustering](5-Clustering/README.md) | Explore the K-Means Clustering Method | [lesson](5-Clustering/2-K-Means/README.md) | Jen |
| 15 | Introduction to Natural Language Processing ☕️ | [Natural Language Processing](6-NLP/README.md) | Learn the basics about NLP by building a simple bot | [lesson](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
@ -92,7 +94,7 @@ By ensuring that the content aligns with projects, the process is made more enga
| 21 | ⚡️ World Power Usage ⚡️ Time Series Forecasting with ARIMA ⚡️ | [Time Series](7-TimeSeries/README.md) | Time Series Forecasting with ARIMA | [lesson](7-TimeSeries/2-ARIMA/README.md) | Francesca |
| 22 | Introduction to Reinforcement Learning | [Reinforcement Learning](8-Reinforcement/README.md) | tbd | [lesson]() | Dmitry |
| 23 | Help Peter avoid the Wolf! 🐺 | [Reinforcement Learning](8-Reinforcement/README.md) | tbd | [lesson]() | Dmitry |
| 24 | Real-World ML Scenarios and Applications | [ML in the Wild](9-Real-World/README.md) | Interesting and Revealing real-world applications of classical ML | [lesson](9-Real-World/1-Applications/README.md) | Team |
| 24 | Real-World ML Scenarios and Applications | [ML in the Wild](9-Real-World/README.md) | Interesting and Revealing real-world applications of classical ML | [lesson](9-Real-World/1-Applications/README.md) | Team |
## Offline access
You can run this documentation offline by using [Docsify](https://docsify.js.org/#/). Fork this repo, [install Docsify](https://docsify.js.org/#/quickstart) on your local machine, and then in the root folder of this repo, type `docsify serve`. The website will be served on port 3000 on your localhost: `localhost:3000`.

Loading…
Cancel
Save