parent
0425110f27
commit
fa91f7a19b
@ -0,0 +1,299 @@
|
|||||||
|
# Introduction to classification
|
||||||
|
|
||||||
|
In these four lessons, you will explore a fundamental focus of classic machine learning - _classification_. We will walk through using various classification algorithms with a dataset about all the brilliant cuisines of Asia and India. Hope you're hungry!
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> Celebrate pan-Asian cuisines in these lessons! Image by [Jen Looper](https://twitter.com/jenlooper)
|
||||||
|
|
||||||
|
Classification is a form of [supervised learning](https://wikipedia.org/wiki/Supervised_learning) that bears a lot in common with regression techniques. If machine learning is all about predicting values or names to things by using datasets, then classification generally falls into two groups: _binary classification_ and _multiclass classification_.
|
||||||
|
|
||||||
|
[](https://youtu.be/eg8DJYwdMyg "Introduction to classification")
|
||||||
|
|
||||||
|
> 🎥 Click the image above for a video: MIT's John Guttag introduces classification
|
||||||
|
|
||||||
|
Remember:
|
||||||
|
|
||||||
|
- **Linear regression** helped you predict relationships between variables and make accurate predictions on where a new datapoint would fall in relationship to that line. So, you could predict _what price a pumpkin would be in September vs. December_, for example.
|
||||||
|
- **Logistic regression** helped you discover "binary categories": at this price point, _is this pumpkin orange or not-orange_?
|
||||||
|
|
||||||
|
Classification uses various algorithms to determine other ways of determining a data point's label or class. Let's work with this cuisine data to see whether, by observing a group of ingredients, we can determine its cuisine of origin.
|
||||||
|
|
||||||
|
## [Questionário inicial](https://white-water-09ec41f0f.azurestaticapps.net/quiz/19/?loc=br)
|
||||||
|
|
||||||
|
> ### [Esta lição está disponível em R!](../solution/R/lesson_10-R.ipynb)
|
||||||
|
|
||||||
|
### Introduction
|
||||||
|
|
||||||
|
Classification is one of the fundamental activities of the machine learning researcher and data scientist. From basic classification of a binary value ("is this email spam or not?"), to complex image classification and segmentation using computer vision, it's always useful to be able to sort data into classes and ask questions of it.
|
||||||
|
|
||||||
|
To state the process in a more scientific way, your classification method creates a predictive model that enables you to map the relationship between input variables to output variables.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> Binary vs. multiclass problems for classification algorithms to handle. Infographic by [Jen Looper](https://twitter.com/jenlooper)
|
||||||
|
|
||||||
|
Before starting the process of cleaning our data, visualizing it, and prepping it for our ML tasks, let's learn a bit about the various ways machine learning can be leveraged to classify data.
|
||||||
|
|
||||||
|
Derived from [statistics](https://wikipedia.org/wiki/Statistical_classification), classification using classic machine learning uses features, such as `smoker`, `weight`, and `age` to determine _likelihood of developing X disease_. As a supervised learning technique similar to the regression exercises you performed earlier, your data is labeled and the ML algorithms use those labels to classify and predict classes (or 'features') of a dataset and assign them to a group or outcome.
|
||||||
|
|
||||||
|
✅ Take a moment to imagine a dataset about cuisines. What would a multiclass model be able to answer? What would a binary model be able to answer? What if you wanted to determine whether a given cuisine was likely to use fenugreek? What if you wanted to see if, given a present of a grocery bag full of star anise, artichokes, cauliflower, and horseradish, you could create a typical Indian dish?
|
||||||
|
|
||||||
|
[](https://youtu.be/GuTeDbaNoEU "Crazy mystery baskets")
|
||||||
|
|
||||||
|
> 🎥 Click the image above for a video.The whole premise of the show 'Chopped' is the 'mystery basket' where chefs have to make some dish out of a random choice of ingredients. Surely a ML model would have helped!
|
||||||
|
|
||||||
|
## Hello 'classifier'
|
||||||
|
|
||||||
|
The question we want to ask of this cuisine dataset is actually a **multiclass question**, as we have several potential national cuisines to work with. Given a batch of ingredients, which of these many classes will the data fit?
|
||||||
|
|
||||||
|
Scikit-learn offers several different algorithms to use to classify data, depending on the kind of problem you want to solve. In the next two lessons, you'll learn about several of these algorithms.
|
||||||
|
|
||||||
|
## Exercise - clean and balance your data
|
||||||
|
|
||||||
|
The first task at hand, before starting this project, is to clean and **balance** your data to get better results. Start with the blank _notebook.ipynb_ file in the root of this folder.
|
||||||
|
|
||||||
|
The first thing to install is [imblearn](https://imbalanced-learn.org/stable/). This is a Scikit-learn package that will allow you to better balance the data (you will learn more about this task in a minute).
|
||||||
|
|
||||||
|
1. To install `imblearn`, run `pip install`, like so:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pip install imblearn
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Import the packages you need to import your data and visualize it, also import `SMOTE` from `imblearn`.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import matplotlib as mpl
|
||||||
|
import numpy as np
|
||||||
|
from imblearn.over_sampling import SMOTE
|
||||||
|
```
|
||||||
|
|
||||||
|
Now you are set up to read import the data next.
|
||||||
|
|
||||||
|
1. The next task will be to import the data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
df = pd.read_csv('../data/cuisines.csv')
|
||||||
|
```
|
||||||
|
|
||||||
|
Using `read_csv()` will read the content of the csv file _cusines.csv_ and place it in the variable `df`.
|
||||||
|
|
||||||
|
1. Check the data's shape:
|
||||||
|
|
||||||
|
```python
|
||||||
|
df.head()
|
||||||
|
```
|
||||||
|
|
||||||
|
The first five rows look like this:
|
||||||
|
|
||||||
|
```output
|
||||||
|
| | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
|
||||||
|
| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
|
||||||
|
| 0 | 65 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
||||||
|
| 1 | 66 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
||||||
|
| 2 | 67 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
||||||
|
| 3 | 68 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
||||||
|
| 4 | 69 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Get info about this data by calling `info()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
df.info()
|
||||||
|
```
|
||||||
|
|
||||||
|
Your out resembles:
|
||||||
|
|
||||||
|
```output
|
||||||
|
<class 'pandas.core.frame.DataFrame'>
|
||||||
|
RangeIndex: 2448 entries, 0 to 2447
|
||||||
|
Columns: 385 entries, Unnamed: 0 to zucchini
|
||||||
|
dtypes: int64(384), object(1)
|
||||||
|
memory usage: 7.2+ MB
|
||||||
|
```
|
||||||
|
|
||||||
|
## Exercise - learning about cuisines
|
||||||
|
|
||||||
|
Now the work starts to become more interesting. Let's discover the distribution of data, per cuisine
|
||||||
|
|
||||||
|
1. Plot the data as bars by calling `barh()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
df.cuisine.value_counts().plot.barh()
|
||||||
|
```
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
There are a finite number of cuisines, but the distribution of data is uneven. You can fix that! Before doing so, explore a little more.
|
||||||
|
|
||||||
|
1. Find out how much data is available per cuisine and print it out:
|
||||||
|
|
||||||
|
```python
|
||||||
|
thai_df = df[(df.cuisine == "thai")]
|
||||||
|
japanese_df = df[(df.cuisine == "japanese")]
|
||||||
|
chinese_df = df[(df.cuisine == "chinese")]
|
||||||
|
indian_df = df[(df.cuisine == "indian")]
|
||||||
|
korean_df = df[(df.cuisine == "korean")]
|
||||||
|
|
||||||
|
print(f'thai df: {thai_df.shape}')
|
||||||
|
print(f'japanese df: {japanese_df.shape}')
|
||||||
|
print(f'chinese df: {chinese_df.shape}')
|
||||||
|
print(f'indian df: {indian_df.shape}')
|
||||||
|
print(f'korean df: {korean_df.shape}')
|
||||||
|
```
|
||||||
|
|
||||||
|
the output looks like so:
|
||||||
|
|
||||||
|
```output
|
||||||
|
thai df: (289, 385)
|
||||||
|
japanese df: (320, 385)
|
||||||
|
chinese df: (442, 385)
|
||||||
|
indian df: (598, 385)
|
||||||
|
korean df: (799, 385)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Discovering ingredients
|
||||||
|
|
||||||
|
Now you can dig deeper into the data and learn what are the typical ingredients per cuisine. You should clean out recurrent data that creates confusion between cuisines, so let's learn about this problem.
|
||||||
|
|
||||||
|
1. Create a function `create_ingredient()` in Python to create an ingredient dataframe. This function will start by dropping an unhelpful column and sort through ingredients by their count:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def create_ingredient_df(df):
|
||||||
|
ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')
|
||||||
|
ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]
|
||||||
|
ingredient_df = ingredient_df.sort_values(by='value', ascending=False,
|
||||||
|
inplace=False)
|
||||||
|
return ingredient_df
|
||||||
|
```
|
||||||
|
|
||||||
|
Now you can use that function to get an idea of top ten most popular ingredients by cuisine.
|
||||||
|
|
||||||
|
1. Call `create_ingredient()` and plot it calling `barh()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
thai_ingredient_df = create_ingredient_df(thai_df)
|
||||||
|
thai_ingredient_df.head(10).plot.barh()
|
||||||
|
```
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
1. Do the same for the japanese data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
japanese_ingredient_df = create_ingredient_df(japanese_df)
|
||||||
|
japanese_ingredient_df.head(10).plot.barh()
|
||||||
|
```
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
1. Now for the chinese ingredients:
|
||||||
|
|
||||||
|
```python
|
||||||
|
chinese_ingredient_df = create_ingredient_df(chinese_df)
|
||||||
|
chinese_ingredient_df.head(10).plot.barh()
|
||||||
|
```
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
1. Plot the indian ingredients:
|
||||||
|
|
||||||
|
```python
|
||||||
|
indian_ingredient_df = create_ingredient_df(indian_df)
|
||||||
|
indian_ingredient_df.head(10).plot.barh()
|
||||||
|
```
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
1. Finally, plot the korean ingredients:
|
||||||
|
|
||||||
|
```python
|
||||||
|
korean_ingredient_df = create_ingredient_df(korean_df)
|
||||||
|
korean_ingredient_df.head(10).plot.barh()
|
||||||
|
```
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
1. Now, drop the most common ingredients that create confusion between distinct cuisines, by calling `drop()`:
|
||||||
|
|
||||||
|
Everyone loves rice, garlic and ginger!
|
||||||
|
|
||||||
|
```python
|
||||||
|
feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)
|
||||||
|
labels_df = df.cuisine #.unique()
|
||||||
|
feature_df.head()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Balance the dataset
|
||||||
|
|
||||||
|
Now that you have cleaned the data, use [SMOTE](https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html) - "Synthetic Minority Over-sampling Technique" - to balance it.
|
||||||
|
|
||||||
|
1. Call `fit_resample()`, this strategy generates new samples by interpolation.
|
||||||
|
|
||||||
|
```python
|
||||||
|
oversample = SMOTE()
|
||||||
|
transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)
|
||||||
|
```
|
||||||
|
|
||||||
|
By balancing your data, you'll have better results when classifying it. Think about a binary classification. If most of your data is one class, a ML model is going to predict that class more frequently, just because there is more data for it. Balancing the data takes any skewed data and helps remove this imbalance.
|
||||||
|
|
||||||
|
1. Now you can check the numbers of labels per ingredient:
|
||||||
|
|
||||||
|
```python
|
||||||
|
print(f'new label count: {transformed_label_df.value_counts()}')
|
||||||
|
print(f'old label count: {df.cuisine.value_counts()}')
|
||||||
|
```
|
||||||
|
|
||||||
|
Your output looks like so:
|
||||||
|
|
||||||
|
```output
|
||||||
|
new label count: korean 799
|
||||||
|
chinese 799
|
||||||
|
indian 799
|
||||||
|
japanese 799
|
||||||
|
thai 799
|
||||||
|
Name: cuisine, dtype: int64
|
||||||
|
old label count: korean 799
|
||||||
|
indian 598
|
||||||
|
chinese 442
|
||||||
|
japanese 320
|
||||||
|
thai 289
|
||||||
|
Name: cuisine, dtype: int64
|
||||||
|
```
|
||||||
|
|
||||||
|
The data is nice and clean, balanced, and very delicious!
|
||||||
|
|
||||||
|
1. The last step is to save your balanced data, including labels and features, into a new dataframe that can be exported into a file:
|
||||||
|
|
||||||
|
```python
|
||||||
|
transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')
|
||||||
|
```
|
||||||
|
|
||||||
|
1. You can take one more look at the data using `transformed_df.head()` and `transformed_df.info()`. Save a copy of this data for use in future lessons:
|
||||||
|
|
||||||
|
```python
|
||||||
|
transformed_df.head()
|
||||||
|
transformed_df.info()
|
||||||
|
transformed_df.to_csv("../data/cleaned_cuisines.csv")
|
||||||
|
```
|
||||||
|
|
||||||
|
This fresh CSV can now be found in the root data folder.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀Desafio
|
||||||
|
|
||||||
|
This curriculum contains several interesting datasets. Dig through the `data` folders and see if any contain datasets that would be appropriate for binary or multi-class classification? What questions would you ask of this dataset?
|
||||||
|
|
||||||
|
## [Questionário para fixação](https://white-water-09ec41f0f.azurestaticapps.net/quiz/20?loc=br)
|
||||||
|
|
||||||
|
## Revisão e Auto Aprendizagem
|
||||||
|
|
||||||
|
Explore SMOTE's API. What use cases is it best used for? What problems does it solve?
|
||||||
|
|
||||||
|
## Tarefa
|
||||||
|
|
||||||
|
[Explore classification methods](assignment.pt-br.md)
|
@ -0,0 +1,11 @@
|
|||||||
|
# Explore classification methods
|
||||||
|
|
||||||
|
## Instruções
|
||||||
|
|
||||||
|
In [Scikit-learn documentation](https://scikit-learn.org/stable/supervised_learning.html) you'll find a large list of ways to classify data. Do a little scavenger hunt in these docs: your goals is to look for classification methods and match a dataset in this curriculum, a question you can ask of it, and a technique of classification. Create a spreadsheet or table in a .doc file and explain how the dataset would work with the classification algorithm.
|
||||||
|
|
||||||
|
## Critérios de avaliação
|
||||||
|
|
||||||
|
| Critério | Exemplar | Adequado | Precisa melhorar |
|
||||||
|
| -------- | ----------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| | a document is presented overviewing 5 algorithms alongside a classification technique. The overview is well-explained and detailed. | a document is presented overviewing 3 algorithms alongside a classification technique. The overview is well-explained and detailed. | a document is presented overviewing fewer than three algorithms alongside a classification technique and the overview is neither well-explained nor detailed. |
|
@ -0,0 +1,11 @@
|
|||||||
|
# Study the solvers
|
||||||
|
|
||||||
|
## Instruções
|
||||||
|
|
||||||
|
In this lesson you learned about the various solvers that pair algorithms with a machine learning process to create an accurate model. Walk through the solvers listed in the lesson and pick two. In your own words, compare and contrast these two solvers. What kind of problem do they address? How do they work with various data structures? Why would you pick one over another?
|
||||||
|
|
||||||
|
## Critérios de avaliação
|
||||||
|
|
||||||
|
| Critério | Exemplar | Adequado | Precisa melhorar |
|
||||||
|
| -------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------ | ---------------------------- |
|
||||||
|
| | A .doc file is presented with two paragraphs, one on each solver, comparing them thoughtfully. | A .doc file is presented with only one paragraph | The assignment is incomplete |
|
@ -0,0 +1,235 @@
|
|||||||
|
# Cuisine classifiers 2
|
||||||
|
|
||||||
|
In this second classification lesson, you will explore more ways to classify numeric data. You will also learn about the ramifications for choosing one classifier over the other.
|
||||||
|
|
||||||
|
## [Questionário inicial](https://white-water-09ec41f0f.azurestaticapps.net/quiz/23?loc=br)
|
||||||
|
|
||||||
|
### Prerequisite
|
||||||
|
|
||||||
|
We assume that you have completed the previous lessons and have a cleaned dataset in your `data` folder called _cleaned_cuisines.csv_ in the root of this 4-lesson folder.
|
||||||
|
|
||||||
|
### Preparation
|
||||||
|
|
||||||
|
We have loaded your _notebook.ipynb_ file with the cleaned dataset and have divided it into X and y dataframes, ready for the model building process.
|
||||||
|
|
||||||
|
## A classification map
|
||||||
|
|
||||||
|
Previously, you learned about the various options you have when classifying data using Microsoft's cheat sheet. Scikit-learn offers a similar, but more granular cheat sheet that can further help narrow down your estimators (another term for classifiers):
|
||||||
|
|
||||||
|

|
||||||
|
> Tip: [visit this map online](https://scikit-learn.org/stable/tutorial/machine_learning_map/) and click along the path to read documentation.
|
||||||
|
|
||||||
|
### The plan
|
||||||
|
|
||||||
|
This map is very helpful once you have a clear grasp of your data, as you can 'walk' along its paths to a decision:
|
||||||
|
|
||||||
|
- We have >50 samples
|
||||||
|
- We want to predict a category
|
||||||
|
- We have labeled data
|
||||||
|
- We have fewer than 100K samples
|
||||||
|
- ✨ We can choose a Linear SVC
|
||||||
|
- If that doesn't work, since we have numeric data
|
||||||
|
- We can try a ✨ KNeighbors Classifier
|
||||||
|
- If that doesn't work, try ✨ SVC and ✨ Ensemble Classifiers
|
||||||
|
|
||||||
|
This is a very helpful trail to follow.
|
||||||
|
|
||||||
|
## Exercise - split the data
|
||||||
|
|
||||||
|
Following this path, we should start by importing some libraries to use.
|
||||||
|
|
||||||
|
1. Import the needed libraries:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sklearn.neighbors import KNeighborsClassifier
|
||||||
|
from sklearn.linear_model import LogisticRegression
|
||||||
|
from sklearn.svm import SVC
|
||||||
|
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
|
||||||
|
from sklearn.model_selection import train_test_split, cross_val_score
|
||||||
|
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
|
||||||
|
import numpy as np
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Split your training and test data:
|
||||||
|
|
||||||
|
```python
|
||||||
|
X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Linear SVC classifier
|
||||||
|
|
||||||
|
Support-Vector clustering (SVC) is a child of the Support-Vector machines family of ML techniques (learn more about these below). In this method, you can choose a 'kernel' to decide how to cluster the labels. The 'C' parameter refers to 'regularization' which regulates the influence of parameters. The kernel can be one of [several](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC); here we set it to 'linear' to ensure that we leverage linear SVC. Probability defaults to 'false'; here we set it to 'true' to gather probability estimates. We set the random state to '0' to shuffle the data to get probabilities.
|
||||||
|
|
||||||
|
### Exercise - apply a linear SVC
|
||||||
|
|
||||||
|
Start by creating an array of classifiers. You will add progressively to this array as we test.
|
||||||
|
|
||||||
|
1. Start with a Linear SVC:
|
||||||
|
|
||||||
|
```python
|
||||||
|
C = 10
|
||||||
|
# Create different classifiers.
|
||||||
|
classifiers = {
|
||||||
|
'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Train your model using the Linear SVC and print out a report:
|
||||||
|
|
||||||
|
```python
|
||||||
|
n_classifiers = len(classifiers)
|
||||||
|
|
||||||
|
for index, (name, classifier) in enumerate(classifiers.items()):
|
||||||
|
classifier.fit(X_train, np.ravel(y_train))
|
||||||
|
|
||||||
|
y_pred = classifier.predict(X_test)
|
||||||
|
accuracy = accuracy_score(y_test, y_pred)
|
||||||
|
print("Accuracy (train) for %s: %0.1f%% " % (name, accuracy * 100))
|
||||||
|
print(classification_report(y_test,y_pred))
|
||||||
|
```
|
||||||
|
|
||||||
|
The result is pretty good:
|
||||||
|
|
||||||
|
```output
|
||||||
|
Accuracy (train) for Linear SVC: 78.6%
|
||||||
|
precision recall f1-score support
|
||||||
|
|
||||||
|
chinese 0.71 0.67 0.69 242
|
||||||
|
indian 0.88 0.86 0.87 234
|
||||||
|
japanese 0.79 0.74 0.76 254
|
||||||
|
korean 0.85 0.81 0.83 242
|
||||||
|
thai 0.71 0.86 0.78 227
|
||||||
|
|
||||||
|
accuracy 0.79 1199
|
||||||
|
macro avg 0.79 0.79 0.79 1199
|
||||||
|
weighted avg 0.79 0.79 0.79 1199
|
||||||
|
```
|
||||||
|
|
||||||
|
## K-Neighbors classifier
|
||||||
|
|
||||||
|
K-Neighbors is part of the "neighbors" family of ML methods, which can be used for both supervised and unsupervised learning. In this method, a predefined number of points is created and data are gathered around these points such that generalized labels can be predicted for the data.
|
||||||
|
|
||||||
|
### Exercise - apply the K-Neighbors classifier
|
||||||
|
|
||||||
|
The previous classifier was good, and worked well with the data, but maybe we can get better accuracy. Try a K-Neighbors classifier.
|
||||||
|
|
||||||
|
1. Add a line to your classifier array (add a comma after the Linear SVC item):
|
||||||
|
|
||||||
|
```python
|
||||||
|
'KNN classifier': KNeighborsClassifier(C),
|
||||||
|
```
|
||||||
|
|
||||||
|
The result is a little worse:
|
||||||
|
|
||||||
|
```output
|
||||||
|
Accuracy (train) for KNN classifier: 73.8%
|
||||||
|
precision recall f1-score support
|
||||||
|
|
||||||
|
chinese 0.64 0.67 0.66 242
|
||||||
|
indian 0.86 0.78 0.82 234
|
||||||
|
japanese 0.66 0.83 0.74 254
|
||||||
|
korean 0.94 0.58 0.72 242
|
||||||
|
thai 0.71 0.82 0.76 227
|
||||||
|
|
||||||
|
accuracy 0.74 1199
|
||||||
|
macro avg 0.76 0.74 0.74 1199
|
||||||
|
weighted avg 0.76 0.74 0.74 1199
|
||||||
|
```
|
||||||
|
|
||||||
|
✅ Learn about [K-Neighbors](https://scikit-learn.org/stable/modules/neighbors.html#neighbors)
|
||||||
|
|
||||||
|
## Support Vector Classifier
|
||||||
|
|
||||||
|
Support-Vector classifiers are part of the [Support-Vector Machine](https://wikipedia.org/wiki/Support-vector_machine) family of ML methods that are used for classification and regression tasks. SVMs "map training examples to points in space" to maximize the distance between two categories. Subsequent data is mapped into this space so their category can be predicted.
|
||||||
|
|
||||||
|
### Exercise - apply a Support Vector Classifier
|
||||||
|
|
||||||
|
Let's try for a little better accuracy with a Support Vector Classifier.
|
||||||
|
|
||||||
|
1. Add a comma after the K-Neighbors item, and then add this line:
|
||||||
|
|
||||||
|
```python
|
||||||
|
'SVC': SVC(),
|
||||||
|
```
|
||||||
|
|
||||||
|
The result is quite good!
|
||||||
|
|
||||||
|
```output
|
||||||
|
Accuracy (train) for SVC: 83.2%
|
||||||
|
precision recall f1-score support
|
||||||
|
|
||||||
|
chinese 0.79 0.74 0.76 242
|
||||||
|
indian 0.88 0.90 0.89 234
|
||||||
|
japanese 0.87 0.81 0.84 254
|
||||||
|
korean 0.91 0.82 0.86 242
|
||||||
|
thai 0.74 0.90 0.81 227
|
||||||
|
|
||||||
|
accuracy 0.83 1199
|
||||||
|
macro avg 0.84 0.83 0.83 1199
|
||||||
|
weighted avg 0.84 0.83 0.83 1199
|
||||||
|
```
|
||||||
|
|
||||||
|
✅ Learn about [Support-Vectors](https://scikit-learn.org/stable/modules/svm.html#svm)
|
||||||
|
|
||||||
|
## Ensemble Classifiers
|
||||||
|
|
||||||
|
Let's follow the path to the very end, even though the previous test was quite good. Let's try some 'Ensemble Classifiers, specifically Random Forest and AdaBoost:
|
||||||
|
|
||||||
|
```python
|
||||||
|
'RFST': RandomForestClassifier(n_estimators=100),
|
||||||
|
'ADA': AdaBoostClassifier(n_estimators=100)
|
||||||
|
```
|
||||||
|
|
||||||
|
The result is very good, especially for Random Forest:
|
||||||
|
|
||||||
|
```output
|
||||||
|
Accuracy (train) for RFST: 84.5%
|
||||||
|
precision recall f1-score support
|
||||||
|
|
||||||
|
chinese 0.80 0.77 0.78 242
|
||||||
|
indian 0.89 0.92 0.90 234
|
||||||
|
japanese 0.86 0.84 0.85 254
|
||||||
|
korean 0.88 0.83 0.85 242
|
||||||
|
thai 0.80 0.87 0.83 227
|
||||||
|
|
||||||
|
accuracy 0.84 1199
|
||||||
|
macro avg 0.85 0.85 0.84 1199
|
||||||
|
weighted avg 0.85 0.84 0.84 1199
|
||||||
|
|
||||||
|
Accuracy (train) for ADA: 72.4%
|
||||||
|
precision recall f1-score support
|
||||||
|
|
||||||
|
chinese 0.64 0.49 0.56 242
|
||||||
|
indian 0.91 0.83 0.87 234
|
||||||
|
japanese 0.68 0.69 0.69 254
|
||||||
|
korean 0.73 0.79 0.76 242
|
||||||
|
thai 0.67 0.83 0.74 227
|
||||||
|
|
||||||
|
accuracy 0.72 1199
|
||||||
|
macro avg 0.73 0.73 0.72 1199
|
||||||
|
weighted avg 0.73 0.72 0.72 1199
|
||||||
|
```
|
||||||
|
|
||||||
|
✅ Learn about [Ensemble Classifiers](https://scikit-learn.org/stable/modules/ensemble.html)
|
||||||
|
|
||||||
|
This method of Machine Learning "combines the predictions of several base estimators" to improve the model's quality. In our example, we used Random Trees and AdaBoost.
|
||||||
|
|
||||||
|
- [Random Forest](https://scikit-learn.org/stable/modules/ensemble.html#forest), an averaging method, builds a 'forest' of 'decision trees' infused with randomness to avoid overfitting. The n_estimators parameter is set to the number of trees.
|
||||||
|
|
||||||
|
- [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) fits a classifier to a dataset and then fits copies of that classifier to the same dataset. It focuses on the weights of incorrectly classified items and adjusts the fit for the next classifier to correct.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀Desafio
|
||||||
|
|
||||||
|
Each of these techniques has a large number of parameters that you can tweak. Research each one's default parameters and think about what tweaking these parameters would mean for the model's quality.
|
||||||
|
|
||||||
|
## [Questionário para fixação](https://white-water-09ec41f0f.azurestaticapps.net/quiz/24?loc=br)
|
||||||
|
|
||||||
|
## Revisão e Auto Aprendizagem
|
||||||
|
|
||||||
|
There's a lot of jargon in these lessons, so take a minute to review [this list](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-15963-cxa) of useful terminology!
|
||||||
|
|
||||||
|
## Tarefa
|
||||||
|
|
||||||
|
[Parameter play](assignment.pt-br.md).
|
@ -0,0 +1,11 @@
|
|||||||
|
# Parameter Play
|
||||||
|
|
||||||
|
## Instruções
|
||||||
|
|
||||||
|
There are a lot of parameters that are set by default when working with these classifiers. Intellisense in VS Code can help you dig into them. Adopt one of the ML Classification Techniques in this lesson and retrain models tweaking various parameter values. Build a notebook explaining why some changes help the model quality while others degrade it. Be detailed in your answer.
|
||||||
|
|
||||||
|
## Critérios de avaliação
|
||||||
|
|
||||||
|
| Critério | Exemplar | Adequado | Precisa melhorar |
|
||||||
|
| -------- | ---------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | ----------------------------- |
|
||||||
|
| | A notebook is presented with a classifier fully built up and its parameters tweaked and changes explained in textboxes | A notebook is partially presented or poorly explained | A notebook is buggy or flawed |
|
@ -0,0 +1,337 @@
|
|||||||
|
# Build a Cuisine Recommender Web App
|
||||||
|
|
||||||
|
In this lesson, you will build a classification model using some of the techniques you have learned in previous lessons and with the delicious cuisine dataset used throughout this series. In addition, you will build a small web app to use a saved model, leveraging Onnx's web runtime.
|
||||||
|
|
||||||
|
One of the most useful practical uses of machine learning is building recommendation systems, and you can take the first step in that direction today!
|
||||||
|
|
||||||
|
[](https://youtu.be/giIXNoiqO_U "Recommendation Systems Introduction")
|
||||||
|
|
||||||
|
> 🎥 Click the image above for a video: Andrew Ng introduces recommendation system design
|
||||||
|
|
||||||
|
## [Questionário inicial](https://white-water-09ec41f0f.azurestaticapps.net/quiz/25?loc=br)
|
||||||
|
|
||||||
|
In this lesson you will learn:
|
||||||
|
|
||||||
|
- How to build a model and save it as an Onnx model
|
||||||
|
- How to use Netron to inspect the model
|
||||||
|
- How to use your model in a web app for inference
|
||||||
|
|
||||||
|
## Build your model
|
||||||
|
|
||||||
|
Building applied ML systems is an important part of leveraging these technologies for your business systems. You can use models within your web applications (and thus use them in an offline context if needed) by using Onnx.
|
||||||
|
|
||||||
|
In a [previous lesson](../../3-Web-App/1-Web-App/README.md), you built a Regression model about UFO sightings, "pickled" it, and used it in a Flask app. While this architecture is very useful to know, it is a full-stack Python app, and your requirements may include the use of a JavaScript application.
|
||||||
|
|
||||||
|
In this lesson, you can build a basic JavaScript-based system for inference. First, however, you need to train a model and convert it for use with Onnx.
|
||||||
|
|
||||||
|
## Exercise - train classification model
|
||||||
|
|
||||||
|
First, train a classification model using the cleaned cuisines dataset we used.
|
||||||
|
|
||||||
|
1. Start by importing useful libraries:
|
||||||
|
|
||||||
|
```python
|
||||||
|
!pip install skl2onnx
|
||||||
|
import pandas as pd
|
||||||
|
```
|
||||||
|
|
||||||
|
You need '[skl2onnx](https://onnx.ai/sklearn-onnx/)' to help convert your Scikit-learn model to Onnx format.
|
||||||
|
|
||||||
|
1. Then, work with your data in the same way you did in previous lessons, by reading a CSV file using `read_csv()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
data = pd.read_csv('../data/cleaned_cuisines.csv')
|
||||||
|
data.head()
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Remove the first two unnecessary columns and save the remaining data as 'X':
|
||||||
|
|
||||||
|
```python
|
||||||
|
X = data.iloc[:,2:]
|
||||||
|
X.head()
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Save the labels as 'y':
|
||||||
|
|
||||||
|
```python
|
||||||
|
y = data[['cuisine']]
|
||||||
|
y.head()
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
### Commence the training routine
|
||||||
|
|
||||||
|
We will use the 'SVC' library which has good accuracy.
|
||||||
|
|
||||||
|
1. Import the appropriate libraries from Scikit-learn:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from sklearn.model_selection import train_test_split
|
||||||
|
from sklearn.svm import SVC
|
||||||
|
from sklearn.model_selection import cross_val_score
|
||||||
|
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Separate training and test sets:
|
||||||
|
|
||||||
|
```python
|
||||||
|
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Build an SVC Classification model as you did in the previous lesson:
|
||||||
|
|
||||||
|
```python
|
||||||
|
model = SVC(kernel='linear', C=10, probability=True,random_state=0)
|
||||||
|
model.fit(X_train,y_train.values.ravel())
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Now, test your model, calling `predict()`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
y_pred = model.predict(X_test)
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Print out a classification report to check the model's quality:
|
||||||
|
|
||||||
|
```python
|
||||||
|
print(classification_report(y_test,y_pred))
|
||||||
|
```
|
||||||
|
|
||||||
|
As we saw before, the accuracy is good:
|
||||||
|
|
||||||
|
```output
|
||||||
|
precision recall f1-score support
|
||||||
|
|
||||||
|
chinese 0.72 0.69 0.70 257
|
||||||
|
indian 0.91 0.87 0.89 243
|
||||||
|
japanese 0.79 0.77 0.78 239
|
||||||
|
korean 0.83 0.79 0.81 236
|
||||||
|
thai 0.72 0.84 0.78 224
|
||||||
|
|
||||||
|
accuracy 0.79 1199
|
||||||
|
macro avg 0.79 0.79 0.79 1199
|
||||||
|
weighted avg 0.79 0.79 0.79 1199
|
||||||
|
```
|
||||||
|
|
||||||
|
### Convert your model to Onnx
|
||||||
|
|
||||||
|
Make sure to do the conversion with the proper Tensor number. This dataset has 380 ingredients listed, so you need to notate that number in `FloatTensorType`:
|
||||||
|
|
||||||
|
1. Convert using a tensor number of 380.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from skl2onnx import convert_sklearn
|
||||||
|
from skl2onnx.common.data_types import FloatTensorType
|
||||||
|
|
||||||
|
initial_type = [('float_input', FloatTensorType([None, 380]))]
|
||||||
|
options = {id(model): {'nocl': True, 'zipmap': False}}
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Create the onx and store as a file **model.onnx**:
|
||||||
|
|
||||||
|
```python
|
||||||
|
onx = convert_sklearn(model, initial_types=initial_type, options=options)
|
||||||
|
with open("./model.onnx", "wb") as f:
|
||||||
|
f.write(onx.SerializeToString())
|
||||||
|
```
|
||||||
|
|
||||||
|
> Note, you can pass in [options](https://onnx.ai/sklearn-onnx/parameterized.html) in your conversion script. In this case, we passed in 'nocl' to be True and 'zipmap' to be False. Since this is a classification model, you have the option to remove ZipMap which produces a list of dictionaries (not necessary). `nocl` refers to class information being included in the model. Reduce your model's size by setting `nocl` to 'True'.
|
||||||
|
|
||||||
|
Running the entire notebook will now build an Onnx model and save it to this folder.
|
||||||
|
|
||||||
|
## View your model
|
||||||
|
|
||||||
|
Onnx models are not very visible in Visual Studio code, but there's a very good free software that many researchers use to visualize the model to ensure that it is properly built. Download [Netron](https://github.com/lutzroeder/Netron) and open your model.onnx file. You can see your simple model visualized, with its 380 inputs and classifier listed:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Netron is a helpful tool to view your models.
|
||||||
|
|
||||||
|
Now you are ready to use this neat model in a web app. Let's build an app that will come in handy when you look in your refrigerator and try to figure out which combination of your leftover ingredients you can use to cook a given cuisine, as determined by your model.
|
||||||
|
|
||||||
|
## Build a recommender web application
|
||||||
|
|
||||||
|
You can use your model directly in a web app. This architecture also allows you to run it locally and even offline if needed. Start by creating an `index.html` file in the same folder where you stored your `model.onnx` file.
|
||||||
|
|
||||||
|
1. In this file _index.html_, add the following markup:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<header>
|
||||||
|
<title>Cuisine Matcher</title>
|
||||||
|
</header>
|
||||||
|
<body>
|
||||||
|
...
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
```
|
||||||
|
|
||||||
|
1. Now, working within the `body` tags, add a little markup to show a list of checkboxes reflecting some ingredients:
|
||||||
|
|
||||||
|
```html
|
||||||
|
<h1>Check your refrigerator. What can you create?</h1>
|
||||||
|
<div id="wrapper">
|
||||||
|
<div class="boxCont">
|
||||||
|
<input type="checkbox" value="4" class="checkbox">
|
||||||
|
<label>apple</label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="boxCont">
|
||||||
|
<input type="checkbox" value="247" class="checkbox">
|
||||||
|
<label>pear</label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="boxCont">
|
||||||
|
<input type="checkbox" value="77" class="checkbox">
|
||||||
|
<label>cherry</label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="boxCont">
|
||||||
|
<input type="checkbox" value="126" class="checkbox">
|
||||||
|
<label>fenugreek</label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="boxCont">
|
||||||
|
<input type="checkbox" value="302" class="checkbox">
|
||||||
|
<label>sake</label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="boxCont">
|
||||||
|
<input type="checkbox" value="327" class="checkbox">
|
||||||
|
<label>soy sauce</label>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="boxCont">
|
||||||
|
<input type="checkbox" value="112" class="checkbox">
|
||||||
|
<label>cumin</label>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div style="padding-top:10px">
|
||||||
|
<button onClick="startInference()">What kind of cuisine can you make?</button>
|
||||||
|
</div>
|
||||||
|
```
|
||||||
|
|
||||||
|
Notice that each checkbox is given a value. This reflects the index where the ingredient is found according to the dataset. Apple, for example, in this alphabetic list, occupies the fifth column, so its value is '4' since we start counting at 0. You can consult the [ingredients spreadsheet](../data/ingredient_indexes.csv) to discover a given ingredient's index.
|
||||||
|
|
||||||
|
Continuing your work in the index.html file, add a script block where the model is called after the final closing `</div>`.
|
||||||
|
|
||||||
|
1. First, import the [Onnx Runtime](https://www.onnxruntime.ai/):
|
||||||
|
|
||||||
|
```html
|
||||||
|
<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web@1.9.09/dist/ort.min.js"></script>
|
||||||
|
```
|
||||||
|
|
||||||
|
> Onnx Runtime is used to enable running your Onnx models across a wide range of hardware platforms, including optimizations and an API to use.
|
||||||
|
|
||||||
|
1. Once the Runtime is in place, you can call it:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
<script>
|
||||||
|
const ingredients = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
|
||||||
|
|
||||||
|
const checks = [].slice.call(document.querySelectorAll('.checkbox'));
|
||||||
|
|
||||||
|
// use an async context to call onnxruntime functions.
|
||||||
|
function init() {
|
||||||
|
|
||||||
|
checks.forEach(function (checkbox, index) {
|
||||||
|
checkbox.onchange = function () {
|
||||||
|
if (this.checked) {
|
||||||
|
var index = checkbox.value;
|
||||||
|
|
||||||
|
if (index !== -1) {
|
||||||
|
ingredients[index] = 1;
|
||||||
|
}
|
||||||
|
console.log(ingredients)
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
var index = checkbox.value;
|
||||||
|
|
||||||
|
if (index !== -1) {
|
||||||
|
ingredients[index] = 0;
|
||||||
|
}
|
||||||
|
console.log(ingredients)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
function testCheckboxes() {
|
||||||
|
for (var i = 0; i < checks.length; i++)
|
||||||
|
if (checks[i].type == "checkbox")
|
||||||
|
if (checks[i].checked)
|
||||||
|
return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function startInference() {
|
||||||
|
|
||||||
|
let checked = testCheckboxes()
|
||||||
|
|
||||||
|
if (checked) {
|
||||||
|
|
||||||
|
try {
|
||||||
|
// create a new session and load the model.
|
||||||
|
|
||||||
|
const session = await ort.InferenceSession.create('./model.onnx');
|
||||||
|
|
||||||
|
const input = new ort.Tensor(new Float32Array(ingredients), [1, 380]);
|
||||||
|
const feeds = { float_input: input };
|
||||||
|
|
||||||
|
// feed inputs and run
|
||||||
|
|
||||||
|
const results = await session.run(feeds);
|
||||||
|
|
||||||
|
// read from results
|
||||||
|
alert('You can enjoy ' + results.label.data[0] + ' cuisine today!')
|
||||||
|
|
||||||
|
} catch (e) {
|
||||||
|
console.log(`failed to inference ONNX model: ${e}.`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else alert("Please check an ingredient")
|
||||||
|
|
||||||
|
}
|
||||||
|
init();
|
||||||
|
|
||||||
|
</script>
|
||||||
|
```
|
||||||
|
|
||||||
|
In this code, there are several things happening:
|
||||||
|
|
||||||
|
1. You created an array of 380 possible values (1 or 0) to be set and sent to the model for inference, depending on whether an ingredient checkbox is checked.
|
||||||
|
2. You created an array of checkboxes and a way to determine whether they were checked in an `init` function that is called when the application starts. When a checkbox is checked, the `ingredients` array is altered to reflect the chosen ingredient.
|
||||||
|
3. You created a `testCheckboxes` function that checks whether any checkbox was checked.
|
||||||
|
4. You use that function when the button is pressed and, if any checkbox is checked, you start inference.
|
||||||
|
5. The inference routine includes:
|
||||||
|
1. Setting up an asynchronous load of the model
|
||||||
|
2. Creating a Tensor structure to send to the model
|
||||||
|
3. Creating 'feeds' that reflects the `float_input` input that you created when training your model (you can use Netron to verify that name)
|
||||||
|
4. Sending these 'feeds' to the model and waiting for a response
|
||||||
|
|
||||||
|
## Test your application
|
||||||
|
|
||||||
|
Open a terminal session in Visual Studio Code in the folder where your index.html file resides. Ensure that you have [http-server](https://www.npmjs.com/package/http-server) installed globally, and type `http-server` at the prompt. A localhost should open and you can view your web app. Check what cuisine is recommended based on various ingredients:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Congratulations, you have created a 'recommendation' web app with a few fields. Take some time to build out this system!
|
||||||
|
|
||||||
|
## 🚀Desafio
|
||||||
|
|
||||||
|
Your web app is very minimal, so continue to build it out using ingredients and their indexes from the [ingredient_indexes](../data/ingredient_indexes.csv) data. What flavor combinations work to create a given national dish?
|
||||||
|
|
||||||
|
## [Questionário para fixação](https://white-water-09ec41f0f.azurestaticapps.net/quiz/26?loc=br)
|
||||||
|
|
||||||
|
## Revisão e Auto Aprendizagem
|
||||||
|
|
||||||
|
While this lesson just touched on the utility of creating a recommendation system for food ingredients, this area of ML applications is very rich in examples. Read some more about how these systems are built:
|
||||||
|
|
||||||
|
- https://www.sciencedirect.com/topics/computer-science/recommendation-engine
|
||||||
|
- https://www.technologyreview.com/2014/08/25/171547/the-ultimate-challenge-for-recommendation-engines/
|
||||||
|
- https://www.technologyreview.com/2015/03/23/168831/everything-is-a-recommendation/
|
||||||
|
|
||||||
|
## Tarefa
|
||||||
|
|
||||||
|
[Build a new recommender](assignment.pt-br.md).
|
@ -0,0 +1,11 @@
|
|||||||
|
# Build a recommender
|
||||||
|
|
||||||
|
## Instruções
|
||||||
|
|
||||||
|
Given your exercises in this lesson, you now know how to build JavaScript-based web app using Onnx Runtime and a converted Onnx model. Experiment with building a new recommender using data from these lessons or sourced elsewhere (give credit, please). You might create a pet recommender given various personality attributes, or a music genre recommender based on a person's mood. Be creative!
|
||||||
|
|
||||||
|
## Critérios de avaliação
|
||||||
|
|
||||||
|
| Critério | Exemplar | Adequado | Precisa melhorar |
|
||||||
|
| -------- | ---------------------------------------------------------------------- | ------------------------------------- | --------------------------------- |
|
||||||
|
| | A web app and notebook are presented, both well documented and running | One of those two is missing or flawed | Both are either missing or flawed |
|
@ -0,0 +1,27 @@
|
|||||||
|
# Getting started with classification
|
||||||
|
|
||||||
|
## Regional topic: Delicious Asian and Indian Cuisines 🍜
|
||||||
|
|
||||||
|
In Asia and India, food traditions are extremely diverse, and very delicious! Let's look at data about regional cuisines to try to understand their ingredients.
|
||||||
|
|
||||||
|

|
||||||
|
> Photo by <a href="https://unsplash.com/@changlisheng?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Lisheng Chang</a> on <a href="https://unsplash.com/s/photos/asian-food?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
|
||||||
|
|
||||||
|
## What you will learn
|
||||||
|
|
||||||
|
In this section, you will build on the skills you learned in the first part of this curriculum all about regression to learn about other classifiers you can use that will help you learn about your data.
|
||||||
|
|
||||||
|
> There are useful low-code tools that can help you learn about working with classification models. Try [Azure ML for this task](https://docs.microsoft.com/learn/modules/create-classification-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
|
||||||
|
|
||||||
|
## Lessons
|
||||||
|
|
||||||
|
1. [Introduction to classification](../1-Introduction/README.pt-br.md)
|
||||||
|
2. [More classifiers](../2-Classifiers-1/README.pt-br.md)
|
||||||
|
3. [Yet other classifiers](../3-Classifiers-2/README.pt-br.md)
|
||||||
|
4. [Applied ML: build a web app](../4-Applied/README.pt-br.md)
|
||||||
|
|
||||||
|
## Créditos
|
||||||
|
|
||||||
|
"Getting started with classification" was written with ♥️ by [Cassie Breviu](https://www.twitter.com/cassieview) and [Jen Looper](https://www.twitter.com/jenlooper)
|
||||||
|
|
||||||
|
The delicious cuisines dataset was sourced from [Kaggle](https://www.kaggle.com/hoandan/asian-and-indian-cuisines).
|
Loading…
Reference in new issue