Merge branch 'main' of https://github.com/microsoft/ML-For-Beginners

5 years ago · aa68db05c8
parent f88bd46fa9 a396714536
commit aa68db05c8
6 changed files with 4482 additions and 102 deletions
--- a/4-Classification/1-Introduction/solution/notebook.ipynb
+++ b/4-Classification/1-Introduction/solution/notebook.ipynb
--- a/4-Classification/2-Classifiers-1/README.md
+++ b/4-Classification/2-Classifiers-1/README.md
@ -1,44 +1,160 @@
 # Recipe Classifiers 1
-
+In this lesson, you will use the dataset you saved from the last lesson full of balanced, clean data all about recipes. You will use this dataset with a variety of classifiers to predict a given national cuisine based on a group of ingredients. While doing so, you'll learn more about some of the ways that algorithms can be leveraged for classification tasks.
 ## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/19/)
 # Preparatory steps to start this lesson
 Assuming you completed Lesson 1, make sure that a `cleaned_cuisines.csv` file exists in the root `/data` folder for these four lessons.
 Working in this lesson's `notebook.ipynb` folder, import that file along with the Pandas library:
 ```python
 import pandas as pd
 recipes_df = pd.read_csv("../../data/cleaned_cuisine.csv")
 recipes_df.head()
 ```
 The data looks like this:
 |     | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
 | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
 | 0   | 0          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
 | 1   | 1          | indian  | 1      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
 | 2   | 2          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
 | 3   | 3          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
 | 4   | 4          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 1      | 0        |
 Now, import several more libraries:
 ```python
 from sklearn.linear_model import LogisticRegression
 from sklearn.model_selection import train_test_split, cross_val_score
 from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
 from sklearn.svm import SVC
 import numpy as np
 ```
 Divide the X and y coordinates into two dataframes for training. `cuisine` can be the labels dataframe:
 ```python
 recipes_label_df = recipes_df['cuisine']
 recipes_label_df.head()
 ```
-Describe what we will learn
+It will look like this:
-### Introduction
+```
 0    indian
 1    indian
 2    indian
 3    indian
 4    indian
 Name: cuisine, dtype: object
 ```
-Describe what will be covered
+Drop that `Unnamed: 0` column and the `cuisine` column and save the rest of the data as trainable features:
-> Notes
+```python
 recipes_feature_df = recipes_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
 recipes_feature_df.head()
 ```
-### Prerequisite
+Your features look like this:
-What steps should have been covered before this lesson?
+| almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke |  ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood |  yam | yeast | yogurt | zucchini |     |
 | -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: | --- |
 |      0 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
 |      1 |        1 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
 |      2 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
 |      3 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
 |      4 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        1 | 0   |
-### Preparation
+Now you are ready to train your model!
-Preparatory steps to start this lesson
+## Choosing your classifier
---
+Now that your data is clean and ready for training, you have to decide which algorithm to use for the job. 
-[Step through content in blocks]
+TODO: discuss the types
-## [Topic 1]
+✅ Todo: knowledge check
-### Task:
+## Train your model
-Work together to progressively enhance your codebase to build the project with shared code:
+Let's train that model. Split your data into training and testing groups:
-```html
+```python
-code blocks
+X_train, X_test, y_train, y_test = train_test_split(recipes_feature_df, recipes_label_df, test_size=0.3)
 ```
-✅ Knowledge Check - use this moment to stretch students' knowledge with open questions
+Use LogisticRegression with a multiclass setting and the lbfgs solver to train.
 ✅ Todo: explain ravel
 ```python
 lr = LogisticRegression(multi_class='ovr',solver='lbfgs')
 model = lr.fit(X_train, np.ravel(y_train))
 accuracy = model.score(X_test, y_test)
 print ("Accuracy is {}".format(accuracy))
 ```
-## [Topic 2]
+The accuracy is good at over 80%!
-## [Topic 3]
+You can see this model in action by testing one row of data (#50):
 ```python
 print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
 print(f'cuisine: {y_test.iloc[50]}')
 ```
 The result is printed:
 ```
 ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')
 cuisine: indian
 ```
 ✅ Try a different row number!
 Digging deeper, you can check for the accuracy of this prediction:
 ```python
 test= X_test.iloc[50].values.reshape(-1, 1).T
 proba = model.predict_proba(test)
 classes = model.classes_
 resultdf = pd.DataFrame(data=proba, columns=classes)
 topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
 topPrediction.head()
 ```
 The result is printed - Indian cuisine is its best guess, with good probability:
 |          |        0 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | -------: | -------: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 |   indian | 0.715851 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 |  chinese | 0.229475 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | japanese | 0.029763 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 |   korean | 0.017277 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 |     thai | 0.007634 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 ✅ Can you explain why the model is pretty sure this is an Indian recipe?
 Get more detail by printing a classification report, as you did in the Regression lessons:
 ```python
 y_pred = model.predict(X_test)
 print(classification_report(y_test,y_pred))
 ```
 | precision    | recall | f1-score | support |      |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | ------------ | ------ | -------- | ------- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 | chinese      | 0.73   | 0.71     | 0.72    | 229  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | indian       | 0.91   | 0.93     | 0.92    | 254  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | japanese     | 0.70   | 0.75     | 0.72    | 220  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | korean       | 0.86   | 0.76     | 0.81    | 242  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | thai         | 0.79   | 0.85     | 0.82    | 254  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | accuracy     | 0.80   | 1199     |         |      |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | macro avg    | 0.80   | 0.80     | 0.80    | 1199 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 | weighted avg | 0.80   | 0.80     | 0.80    | 1199 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 ## 🚀Challenge
 Add a challenge for students to work on collaboratively in class to enhance the project
--- a/4-Classification/2-Classifiers-1/notebook.ipynb
+++ b/4-Classification/2-Classifiers-1/notebook.ipynb
@ -0,0 +1,28 @@
 {
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": 3
  },
  "orig_nbformat": 2
 },
 "nbformat": 4,
 "nbformat_minor": 2,
 "cells": [
  {
   "source": [
    "# Build Classification Models"
   ],
   "cell_type": "markdown",
   "metadata": {}
  }
 ]
 }
--- a/4-Classification/2-Classifiers-1/solution/notebook.ipynb
+++ b/4-Classification/2-Classifiers-1/solution/notebook.ipynb
@ -1,29 +1,15 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Build Classification Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
-    "from sklearn.linear_model import LogisticRegression\n",
+    "# Build Classification Models"
-    "from sklearn.model_selection import train_test_split, cross_val_score\n",
+   ],
-    "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\n",
+   "cell_type": "markdown",
-    "from sklearn.svm import SVC\n",
+   "metadata": {}
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
@ -56,17 +42,31 @@
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Unnamed: 0</th>\n      <th>cuisine</th>\n      <th>almond</th>\n      <th>angelica</th>\n      <th>anise</th>\n      <th>anise_seed</th>\n      <th>apple</th>\n      <th>apple_brandy</th>\n      <th>apricot</th>\n      <th>armagnac</th>\n      <th>...</th>\n      <th>whiskey</th>\n      <th>white_bread</th>\n      <th>white_wine</th>\n      <th>whole_grain_wheat_flour</th>\n      <th>wine</th>\n      <th>wood</th>\n      <th>yam</th>\n      <th>yeast</th>\n      <th>yogurt</th>\n      <th>zucchini</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>indian</td>\n      <td>1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>3</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>4</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>1</td>\n      <td>0</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 382 columns</p>\n</div>"
     },
     "metadata": {},
-     "execution_count": 26
+     "execution_count": 12
    }
   ],
   "source": [
    "import pandas as pd\n",
    "recipes_df = pd.read_csv(\"../../data/cleaned_cuisine.csv\")\n",
    "recipes_df.head()"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.model_selection import train_test_split, cross_val_score\n",
    "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\n",
    "from sklearn.svm import SVC\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
@ -82,7 +82,7 @@
      ]
     },
     "metadata": {},
-     "execution_count": 27
+     "execution_count": 14
    }
   ],
   "source": [
@ -92,7 +92,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
@ -125,7 +125,7 @@
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>almond</th>\n      <th>angelica</th>\n      <th>anise</th>\n      <th>anise_seed</th>\n      <th>apple</th>\n      <th>apple_brandy</th>\n      <th>apricot</th>\n      <th>armagnac</th>\n      <th>artemisia</th>\n      <th>artichoke</th>\n      <th>...</th>\n      <th>whiskey</th>\n      <th>white_bread</th>\n      <th>white_wine</th>\n      <th>whole_grain_wheat_flour</th>\n      <th>wine</th>\n      <th>wood</th>\n      <th>yam</th>\n      <th>yeast</th>\n      <th>yogurt</th>\n      <th>zucchini</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>1</td>\n      <td>0</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 380 columns</p>\n</div>"
     },
     "metadata": {},
-     "execution_count": 28
+     "execution_count": 15
    }
   ],
   "source": [
@ -135,7 +135,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
@ -144,14 +144,14 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
-      "Accuracy is 0.810675562969141\n"
+      "Accuracy is 0.8023352793994996\n"
     ]
    }
   ],
@ -165,26 +165,26 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
-      "ingredients: Index(['bean', 'coriander', 'cumin', 'fenugreek', 'pepper', 'turmeric',\n       'vegetable_oil'],\n      dtype='object')\ncusine: thai\n"
+      "ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')\ncuisine: indian\n"
     ]
    }
   ],
   "source": [
    "# test an item\n",
-    "print(f'ingredients: {X_test.iloc[20][X_test.iloc[20]!=0].keys()}')\n",
+    "print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')\n",
-    "print(f'cuisine: {y_test.iloc[20]}')"
+    "print(f'cuisine: {y_test.iloc[50]}')"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
@ -192,42 +192,42 @@
     "data": {
      "text/plain": [
       "                 0\n",
-       "indian    0.530435\n",
+       "indian    0.715851\n",
-       "thai      0.344293\n",
+       "chinese   0.229475\n",
-       "japanese  0.108792\n",
+       "japanese  0.029763\n",
-       "chinese   0.015001\n",
+       "korean    0.017277\n",
-       "korean    0.001480"
+       "thai      0.007634"
      ],
-      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>indian</th>\n      <td>0.530435</td>\n    </tr>\n    <tr>\n      <th>thai</th>\n      <td>0.344293</td>\n    </tr>\n    <tr>\n      <th>japanese</th>\n      <td>0.108792</td>\n    </tr>\n    <tr>\n      <th>chinese</th>\n      <td>0.015001</td>\n    </tr>\n    <tr>\n      <th>korean</th>\n      <td>0.001480</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
+      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>indian</th>\n      <td>0.715851</td>\n    </tr>\n    <tr>\n      <th>chinese</th>\n      <td>0.229475</td>\n    </tr>\n    <tr>\n      <th>japanese</th>\n      <td>0.029763</td>\n    </tr>\n    <tr>\n      <th>korean</th>\n      <td>0.017277</td>\n    </tr>\n    <tr>\n      <th>thai</th>\n      <td>0.007634</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
-     "execution_count": 32
+     "execution_count": 24
    }
   ],
   "source": [
-    "#rehsape to 2d array and transpose\r\n",
+    "#rehsape to 2d array and transpose\n",
-    "test= X_test.iloc[20].values.reshape(-1, 1).T\r\n",
+    "test= X_test.iloc[50].values.reshape(-1, 1).T\n",
-    "# predict with score\r\n",
+    "# predict with score\n",
-    "proba = model.predict_proba(test)\r\n",
+    "proba = model.predict_proba(test)\n",
-    "classes = model.classes_\r\n",
+    "classes = model.classes_\n",
-    "# create df with classes and scores\r\n",
+    "# create df with classes and scores\n",
-    "resultdf = pd.DataFrame(data=proba, columns=classes)\r\n",
+    "resultdf = pd.DataFrame(data=proba, columns=classes)\n",
-    "\r\n",
+    "\n",
-    "# create df to show results\r\n",
+    "# create df to show results\n",
-    "topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])\r\n",
+    "topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])\n",
    "topPrediction.head()"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
-      "              precision    recall  f1-score   support\n\n     chinese       0.75      0.67      0.70       231\n      indian       0.91      0.90      0.90       255\n    japanese       0.77      0.82      0.79       260\n      korean       0.83      0.83      0.83       220\n        thai       0.79      0.83      0.81       233\n\n    accuracy                           0.81      1199\n   macro avg       0.81      0.81      0.81      1199\nweighted avg       0.81      0.81      0.81      1199\n\n"
+      "              precision    recall  f1-score   support\n\n     chinese       0.73      0.71      0.72       229\n      indian       0.91      0.93      0.92       254\n    japanese       0.70      0.75      0.72       220\n      korean       0.86      0.76      0.81       242\n        thai       0.79      0.85      0.82       254\n\n    accuracy                           0.80      1199\n   macro avg       0.80      0.80      0.80      1199\nweighted avg       0.80      0.80      0.80      1199\n\n"
     ]
    }
   ],
@ -245,7 +245,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
@ -272,17 +272,17 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
-      "Accuracy (train) for L1 logistic: 79.4% \n",
+      "Accuracy (train) for L1 logistic: 79.5% \n",
-      "Accuracy (train) for L2 logistic (Multinomial): 79.2% \n",
+      "Accuracy (train) for L2 logistic (Multinomial): 80.1% \n",
-      "Accuracy (train) for L2 logistic (OvR): 80.2% \n",
+      "Accuracy (train) for L2 logistic (OvR): 80.7% \n",
-      "Accuracy (train) for Linear SVC: 79.1% \n"
+      "Accuracy (train) for Linear SVC: 78.5% \n"
     ]
    }
   ],
--- a/4-Classification/3-Classifiers-2/solution/notebook.ipynb
+++ b/4-Classification/3-Classifiers-2/solution/notebook.ipynb
@ -0,0 +1,242 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Build Classification Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "   Unnamed: 0 cuisine  almond  angelica  anise  anise_seed  apple  \\\n",
       "0           0  indian       0         0      0           0      0   \n",
       "1           1  indian       1         0      0           0      0   \n",
       "2           2  indian       0         0      0           0      0   \n",
       "3           3  indian       0         0      0           0      0   \n",
       "4           4  indian       0         0      0           0      0   \n",
       "\n",
       "   apple_brandy  apricot  armagnac  ...  whiskey  white_bread  white_wine  \\\n",
       "0             0        0         0  ...        0            0           0   \n",
       "1             0        0         0  ...        0            0           0   \n",
       "2             0        0         0  ...        0            0           0   \n",
       "3             0        0         0  ...        0            0           0   \n",
       "4             0        0         0  ...        0            0           0   \n",
       "\n",
       "   whole_grain_wheat_flour  wine  wood  yam  yeast  yogurt  zucchini  \n",
       "0                        0     0     0    0      0       0         0  \n",
       "1                        0     0     0    0      0       0         0  \n",
       "2                        0     0     0    0      0       0         0  \n",
       "3                        0     0     0    0      0       0         0  \n",
       "4                        0     0     0    0      0       1         0  \n",
       "\n",
       "[5 rows x 382 columns]"
      ],
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Unnamed: 0</th>\n      <th>cuisine</th>\n      <th>almond</th>\n      <th>angelica</th>\n      <th>anise</th>\n      <th>anise_seed</th>\n      <th>apple</th>\n      <th>apple_brandy</th>\n      <th>apricot</th>\n      <th>armagnac</th>\n      <th>...</th>\n      <th>whiskey</th>\n      <th>white_bread</th>\n      <th>white_wine</th>\n      <th>whole_grain_wheat_flour</th>\n      <th>wine</th>\n      <th>wood</th>\n      <th>yam</th>\n      <th>yeast</th>\n      <th>yogurt</th>\n      <th>zucchini</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>indian</td>\n      <td>1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>3</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>4</td>\n      <td>indian</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>1</td>\n      <td>0</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 382 columns</p>\n</div>"
     },
     "metadata": {},
     "execution_count": 1
    }
   ],
   "source": [
    "import pandas as pd\n",
    "recipes_df = pd.read_csv(\"../../data/cleaned_cuisine.csv\")\n",
    "recipes_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.model_selection import train_test_split, cross_val_score\n",
    "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\n",
    "from sklearn.svm import SVC\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "0    indian\n",
       "1    indian\n",
       "2    indian\n",
       "3    indian\n",
       "4    indian\n",
       "Name: cuisine, dtype: object"
      ]
     },
     "metadata": {},
     "execution_count": 3
    }
   ],
   "source": [
    "recipes_label_df = recipes_df['cuisine']\n",
    "recipes_label_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "   almond  angelica  anise  anise_seed  apple  apple_brandy  apricot  \\\n",
       "0       0         0      0           0      0             0        0   \n",
       "1       1         0      0           0      0             0        0   \n",
       "2       0         0      0           0      0             0        0   \n",
       "3       0         0      0           0      0             0        0   \n",
       "4       0         0      0           0      0             0        0   \n",
       "\n",
       "   armagnac  artemisia  artichoke  ...  whiskey  white_bread  white_wine  \\\n",
       "0         0          0          0  ...        0            0           0   \n",
       "1         0          0          0  ...        0            0           0   \n",
       "2         0          0          0  ...        0            0           0   \n",
       "3         0          0          0  ...        0            0           0   \n",
       "4         0          0          0  ...        0            0           0   \n",
       "\n",
       "   whole_grain_wheat_flour  wine  wood  yam  yeast  yogurt  zucchini  \n",
       "0                        0     0     0    0      0       0         0  \n",
       "1                        0     0     0    0      0       0         0  \n",
       "2                        0     0     0    0      0       0         0  \n",
       "3                        0     0     0    0      0       0         0  \n",
       "4                        0     0     0    0      0       1         0  \n",
       "\n",
       "[5 rows x 380 columns]"
      ],
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>almond</th>\n      <th>angelica</th>\n      <th>anise</th>\n      <th>anise_seed</th>\n      <th>apple</th>\n      <th>apple_brandy</th>\n      <th>apricot</th>\n      <th>armagnac</th>\n      <th>artemisia</th>\n      <th>artichoke</th>\n      <th>...</th>\n      <th>whiskey</th>\n      <th>white_bread</th>\n      <th>white_wine</th>\n      <th>whole_grain_wheat_flour</th>\n      <th>wine</th>\n      <th>wood</th>\n      <th>yam</th>\n      <th>yeast</th>\n      <th>yogurt</th>\n      <th>zucchini</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>...</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>0</td>\n      <td>1</td>\n      <td>0</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 380 columns</p>\n</div>"
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "source": [
    "recipes_feature_df = recipes_df.drop(['Unnamed: 0', 'cuisine'], axis=1)\n",
    "recipes_feature_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(recipes_feature_df, recipes_label_df, test_size=0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Try different classifiers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "\r\n",
    "C = 10\r\n",
    "# Create different classifiers.\r\n",
    "classifiers = {\r\n",
    "    'L1 logistic': LogisticRegression(C=C, penalty='l1',\r\n",
    "                                      solver='saga',\r\n",
    "                                      multi_class='multinomial',\r\n",
    "                                      max_iter=10000),\r\n",
    "    'L2 logistic (Multinomial)': LogisticRegression(C=C, penalty='l2',\r\n",
    "                                                    solver='saga',\r\n",
    "                                                    multi_class='multinomial',\r\n",
    "                                                    max_iter=10000),\r\n",
    "    'L2 logistic (OvR)': LogisticRegression(C=C, penalty='l2',\r\n",
    "                                            solver='saga',\r\n",
    "                                            multi_class='ovr',\r\n",
    "                                            max_iter=10000),\r\n",
    "    'Linear SVC': SVC(kernel='linear', C=C, probability=True,\r\n",
    "                      random_state=0)\r\n",
    "}\r\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Accuracy (train) for L1 logistic: 79.8% \n",
      "Accuracy (train) for L2 logistic (Multinomial): 80.2% \n",
      "Accuracy (train) for L2 logistic (OvR): 81.3% \n",
      "Accuracy (train) for Linear SVC: 79.6% \n"
     ]
    }
   ],
   "source": [
    "n_classifiers = len(classifiers)\r\n",
    "\r\n",
    "for index, (name, classifier) in enumerate(classifiers.items()):\r\n",
    "    classifier.fit(X_train, np.ravel(y_train))\r\n",
    "\r\n",
    "    y_pred = classifier.predict(X_test)\r\n",
    "    accuracy = accuracy_score(y_test, y_pred)\r\n",
    "    print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\r\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "dd61f40108e2a19f4ef0d3ebbc6b6eea57ab3c4bc13b15fe6f390d3d86442534"
  },
  "kernelspec": {
   "name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7",
   "display_name": "Python 3.7.0 64-bit ('3.7')"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  },
  "metadata": {
   "interpreter": {
    "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
--- a/4-Classification/data/cleaned_cuisine.csv
+++ b/4-Classification/data/cleaned_cuisine.csv