classification 2

3 years ago · 8b2ebdb064
parent f0c4490404
commit 8b2ebdb064
4 changed files with 4183 additions and 45 deletions
--- a/4-Classification/1-Introduction/solution/notebook.ipynb
+++ b/4-Classification/1-Introduction/solution/notebook.ipynb
--- a/4-Classification/2-Classifiers-1/README.md
+++ b/4-Classification/2-Classifiers-1/README.md
@ -1,44 +1,160 @@
 # Recipe Classifiers 1

-
+In this lesson, you will use the dataset you saved from the last lesson full of balanced, clean data all about recipes. You will use this dataset with a variety of classifiers to predict a given national cuisine based on a group of ingredients. While doing so, you'll learn more about some of the ways that algorithms can be leveraged for classification tasks.
 ## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/19/)
+# Preparatory steps to start this lesson
+
+Assuming you completed Lesson 1, make sure that a `cleaned_cuisines.csv` file exists in the root `/data` folder for these four lessons.
+
+Working in this lesson's `notebook.ipynb` folder, import that file along with the Pandas library:
+
+```python
+import pandas as pd
+recipes_df = pd.read_csv("../../data/cleaned_cuisine.csv")
+recipes_df.head()
+```
+The data looks like this:
+
+|     | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+| 0   | 0          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
+| 1   | 1          | indian  | 1      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
+| 2   | 2          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
+| 3   | 3          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 0      | 0        |
+| 4   | 4          | indian  | 0      | 0        | 0     | 0          | 0     | 0            | 0       | 0        | ... | 0       | 0           | 0          | 0                       | 0    | 0    | 0   | 0     | 1      | 0        |
+
+Now, import several more libraries:
+
+```python
+from sklearn.linear_model import LogisticRegression
+from sklearn.model_selection import train_test_split, cross_val_score
+from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+from sklearn.svm import SVC
+import numpy as np
+```
+
+Divide the X and y coordinates into two dataframes for training. `cuisine` can be the labels dataframe:
+
+```python
+recipes_label_df = recipes_df['cuisine']
+recipes_label_df.head()
+```

-Describe what we will learn
+It will look like this:

-### Introduction
+```
+0    indian
+1    indian
+2    indian
+3    indian
+4    indian
+Name: cuisine, dtype: object
+```

-Describe what will be covered
+Drop that `Unnamed: 0` column and the `cuisine` column and save the rest of the data as trainable features:

-> Notes
+```python
+recipes_feature_df = recipes_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
+recipes_feature_df.head()
+```

-### Prerequisite
+Your features look like this:

-What steps should have been covered before this lesson?
+| almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke |  ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood |  yam | yeast | yogurt | zucchini |     |
+| -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: | --- |
+|      0 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
+|      1 |        1 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
+|      2 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
+|      3 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        0 | 0   |
+|      4 |        0 |     0 |          0 |     0 |            0 |       0 |        0 |         0 |         0 |    0 |     ... |           0 |          0 |                       0 |    0 |    0 |    0 |     0 |      0 |        1 | 0   |

-### Preparation
+Now you are ready to train your model!

-Preparatory steps to start this lesson
+## Choosing your classifier

---
+Now that your data is clean and ready for training, you have to decide which algorithm to use for the job. 

-[Step through content in blocks]
+TODO: discuss the types

-## [Topic 1]
+✅ Todo: knowledge check

-### Task:
+## Train your model

-Work together to progressively enhance your codebase to build the project with shared code:
+Let's train that model. Split your data into training and testing groups:

-```html
-code blocks
+```python
+X_train, X_test, y_train, y_test = train_test_split(recipes_feature_df, recipes_label_df, test_size=0.3)
 ```

-✅ Knowledge Check - use this moment to stretch students' knowledge with open questions
+Use LogisticRegression with a multiclass setting and the lbfgs solver to train.
+
+✅ Todo: explain ravel
+
+```python
+lr = LogisticRegression(multi_class='ovr',solver='lbfgs')
+model = lr.fit(X_train, np.ravel(y_train))
+
+accuracy = model.score(X_test, y_test)
+print ("Accuracy is {}".format(accuracy))
+```

-## [Topic 2]
+The accuracy is good at over 80%!

-## [Topic 3]
+You can see this model in action by testing one row of data (#50):
+
+```python
+print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
+print(f'cuisine: {y_test.iloc[50]}')
+```
+The result is printed:
+```
+ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')
+cuisine: indian
+```
+
+✅ Try a different row number!
+
+
+Digging deeper, you can check for the accuracy of this prediction:
+
+```python
+test= X_test.iloc[50].values.reshape(-1, 1).T
+proba = model.predict_proba(test)
+classes = model.classes_
+resultdf = pd.DataFrame(data=proba, columns=classes)
+
+topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
+topPrediction.head()
+```
+The result is printed - Indian cuisine is its best guess, with good probability:
+
+|          |        0 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| -------: | -------: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+|   indian | 0.715851 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+|  chinese | 0.229475 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| japanese | 0.029763 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+|   korean | 0.017277 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+|     thai | 0.007634 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+
+✅ Can you explain why the model is pretty sure this is an Indian recipe?
+
+Get more detail by printing a classification report, as you did in the Regression lessons:
+
+```python
+y_pred = model.predict(X_test)
+print(classification_report(y_test,y_pred))
+```

+| precision    | recall | f1-score | support |      |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| ------------ | ------ | -------- | ------- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| chinese      | 0.73   | 0.71     | 0.72    | 229  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| indian       | 0.91   | 0.93     | 0.92    | 254  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| japanese     | 0.70   | 0.75     | 0.72    | 220  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| korean       | 0.86   | 0.76     | 0.81    | 242  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| thai         | 0.79   | 0.85     | 0.82    | 254  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| accuracy     | 0.80   | 1199     |         |      |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| macro avg    | 0.80   | 0.80     | 0.80    | 1199 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
+| weighted avg | 0.80   | 0.80     | 0.80    | 1199 |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
 ## 🚀Challenge

 Add a challenge for students to work on collaboratively in class to enhance the project
--- a/4-Classification/2-Classifiers-1/notebook.ipynb
+++ b/4-Classification/2-Classifiers-1/notebook.ipynb
@ -0,0 +1,28 @@
+{
+ "metadata": {
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": 3
+  },
+  "orig_nbformat": 2
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+  {
+   "source": [
+    "# Build Classification Models"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  }
+ ]
+}
--- a/4-Classification/data/cleaned_cuisine.csv
+++ b/4-Classification/data/cleaned_cuisine.csv