starting classification lesson translated to pt-br

pull/483/head
ariannemacena 4 years ago
parent 0425110f27
commit fa91f7a19b

@ -0,0 +1,299 @@
# Introduction to classification
In these four lessons, you will explore a fundamental focus of classic machine learning - _classification_. We will walk through using various classification algorithms with a dataset about all the brilliant cuisines of Asia and India. Hope you're hungry!
![just a pinch!](../images/pinch.png)
> Celebrate pan-Asian cuisines in these lessons! Image by [Jen Looper](https://twitter.com/jenlooper)
Classification is a form of [supervised learning](https://wikipedia.org/wiki/Supervised_learning) that bears a lot in common with regression techniques. If machine learning is all about predicting values or names to things by using datasets, then classification generally falls into two groups: _binary classification_ and _multiclass classification_.
[![Introduction to classification](https://img.youtube.com/vi/eg8DJYwdMyg/0.jpg)](https://youtu.be/eg8DJYwdMyg "Introduction to classification")
> 🎥 Click the image above for a video: MIT's John Guttag introduces classification
Remember:
- **Linear regression** helped you predict relationships between variables and make accurate predictions on where a new datapoint would fall in relationship to that line. So, you could predict _what price a pumpkin would be in September vs. December_, for example.
- **Logistic regression** helped you discover "binary categories": at this price point, _is this pumpkin orange or not-orange_?
Classification uses various algorithms to determine other ways of determining a data point's label or class. Let's work with this cuisine data to see whether, by observing a group of ingredients, we can determine its cuisine of origin.
## [Questionário inicial](https://white-water-09ec41f0f.azurestaticapps.net/quiz/19/?loc=br)
> ### [Esta lição está disponível em R!](../solution/R/lesson_10-R.ipynb)
### Introduction
Classification is one of the fundamental activities of the machine learning researcher and data scientist. From basic classification of a binary value ("is this email spam or not?"), to complex image classification and segmentation using computer vision, it's always useful to be able to sort data into classes and ask questions of it.
To state the process in a more scientific way, your classification method creates a predictive model that enables you to map the relationship between input variables to output variables.
![binary vs. multiclass classification](../images/binary-multiclass.png)
> Binary vs. multiclass problems for classification algorithms to handle. Infographic by [Jen Looper](https://twitter.com/jenlooper)
Before starting the process of cleaning our data, visualizing it, and prepping it for our ML tasks, let's learn a bit about the various ways machine learning can be leveraged to classify data.
Derived from [statistics](https://wikipedia.org/wiki/Statistical_classification), classification using classic machine learning uses features, such as `smoker`, `weight`, and `age` to determine _likelihood of developing X disease_. As a supervised learning technique similar to the regression exercises you performed earlier, your data is labeled and the ML algorithms use those labels to classify and predict classes (or 'features') of a dataset and assign them to a group or outcome.
✅ Take a moment to imagine a dataset about cuisines. What would a multiclass model be able to answer? What would a binary model be able to answer? What if you wanted to determine whether a given cuisine was likely to use fenugreek? What if you wanted to see if, given a present of a grocery bag full of star anise, artichokes, cauliflower, and horseradish, you could create a typical Indian dish?
[![Crazy mystery baskets](https://img.youtube.com/vi/GuTeDbaNoEU/0.jpg)](https://youtu.be/GuTeDbaNoEU "Crazy mystery baskets")
> 🎥 Click the image above for a video.The whole premise of the show 'Chopped' is the 'mystery basket' where chefs have to make some dish out of a random choice of ingredients. Surely a ML model would have helped!
## Hello 'classifier'
The question we want to ask of this cuisine dataset is actually a **multiclass question**, as we have several potential national cuisines to work with. Given a batch of ingredients, which of these many classes will the data fit?
Scikit-learn offers several different algorithms to use to classify data, depending on the kind of problem you want to solve. In the next two lessons, you'll learn about several of these algorithms.
## Exercise - clean and balance your data
The first task at hand, before starting this project, is to clean and **balance** your data to get better results. Start with the blank _notebook.ipynb_ file in the root of this folder.
The first thing to install is [imblearn](https://imbalanced-learn.org/stable/). This is a Scikit-learn package that will allow you to better balance the data (you will learn more about this task in a minute).
1. To install `imblearn`, run `pip install`, like so:
```python
pip install imblearn
```
1. Import the packages you need to import your data and visualize it, also import `SMOTE` from `imblearn`.
```python
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
from imblearn.over_sampling import SMOTE
```
Now you are set up to read import the data next.
1. The next task will be to import the data:
```python
df = pd.read_csv('../data/cuisines.csv')
```
Using `read_csv()` will read the content of the csv file _cusines.csv_ and place it in the variable `df`.
1. Check the data's shape:
```python
df.head()
```
The first five rows look like this:
```output
| | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
| 0 | 65 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 66 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 67 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 68 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 69 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
```
1. Get info about this data by calling `info()`:
```python
df.info()
```
Your out resembles:
```output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2448 entries, 0 to 2447
Columns: 385 entries, Unnamed: 0 to zucchini
dtypes: int64(384), object(1)
memory usage: 7.2+ MB
```
## Exercise - learning about cuisines
Now the work starts to become more interesting. Let's discover the distribution of data, per cuisine
1. Plot the data as bars by calling `barh()`:
```python
df.cuisine.value_counts().plot.barh()
```
![cuisine data distribution](../images/cuisine-dist.png)
There are a finite number of cuisines, but the distribution of data is uneven. You can fix that! Before doing so, explore a little more.
1. Find out how much data is available per cuisine and print it out:
```python
thai_df = df[(df.cuisine == "thai")]
japanese_df = df[(df.cuisine == "japanese")]
chinese_df = df[(df.cuisine == "chinese")]
indian_df = df[(df.cuisine == "indian")]
korean_df = df[(df.cuisine == "korean")]
print(f'thai df: {thai_df.shape}')
print(f'japanese df: {japanese_df.shape}')
print(f'chinese df: {chinese_df.shape}')
print(f'indian df: {indian_df.shape}')
print(f'korean df: {korean_df.shape}')
```
the output looks like so:
```output
thai df: (289, 385)
japanese df: (320, 385)
chinese df: (442, 385)
indian df: (598, 385)
korean df: (799, 385)
```
## Discovering ingredients
Now you can dig deeper into the data and learn what are the typical ingredients per cuisine. You should clean out recurrent data that creates confusion between cuisines, so let's learn about this problem.
1. Create a function `create_ingredient()` in Python to create an ingredient dataframe. This function will start by dropping an unhelpful column and sort through ingredients by their count:
```python
def create_ingredient_df(df):
ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')
ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]
ingredient_df = ingredient_df.sort_values(by='value', ascending=False,
inplace=False)
return ingredient_df
```
Now you can use that function to get an idea of top ten most popular ingredients by cuisine.
1. Call `create_ingredient()` and plot it calling `barh()`:
```python
thai_ingredient_df = create_ingredient_df(thai_df)
thai_ingredient_df.head(10).plot.barh()
```
![thai](../images/thai.png)
1. Do the same for the japanese data:
```python
japanese_ingredient_df = create_ingredient_df(japanese_df)
japanese_ingredient_df.head(10).plot.barh()
```
![japanese](../images/japanese.png)
1. Now for the chinese ingredients:
```python
chinese_ingredient_df = create_ingredient_df(chinese_df)
chinese_ingredient_df.head(10).plot.barh()
```
![chinese](../images/chinese.png)
1. Plot the indian ingredients:
```python
indian_ingredient_df = create_ingredient_df(indian_df)
indian_ingredient_df.head(10).plot.barh()
```
![indian](../images/indian.png)
1. Finally, plot the korean ingredients:
```python
korean_ingredient_df = create_ingredient_df(korean_df)
korean_ingredient_df.head(10).plot.barh()
```
![korean](../images/korean.png)
1. Now, drop the most common ingredients that create confusion between distinct cuisines, by calling `drop()`:
Everyone loves rice, garlic and ginger!
```python
feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)
labels_df = df.cuisine #.unique()
feature_df.head()
```
## Balance the dataset
Now that you have cleaned the data, use [SMOTE](https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html) - "Synthetic Minority Over-sampling Technique" - to balance it.
1. Call `fit_resample()`, this strategy generates new samples by interpolation.
```python
oversample = SMOTE()
transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)
```
By balancing your data, you'll have better results when classifying it. Think about a binary classification. If most of your data is one class, a ML model is going to predict that class more frequently, just because there is more data for it. Balancing the data takes any skewed data and helps remove this imbalance.
1. Now you can check the numbers of labels per ingredient:
```python
print(f'new label count: {transformed_label_df.value_counts()}')
print(f'old label count: {df.cuisine.value_counts()}')
```
Your output looks like so:
```output
new label count: korean 799
chinese 799
indian 799
japanese 799
thai 799
Name: cuisine, dtype: int64
old label count: korean 799
indian 598
chinese 442
japanese 320
thai 289
Name: cuisine, dtype: int64
```
The data is nice and clean, balanced, and very delicious!
1. The last step is to save your balanced data, including labels and features, into a new dataframe that can be exported into a file:
```python
transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')
```
1. You can take one more look at the data using `transformed_df.head()` and `transformed_df.info()`. Save a copy of this data for use in future lessons:
```python
transformed_df.head()
transformed_df.info()
transformed_df.to_csv("../data/cleaned_cuisines.csv")
```
This fresh CSV can now be found in the root data folder.
---
## 🚀Desafio
This curriculum contains several interesting datasets. Dig through the `data` folders and see if any contain datasets that would be appropriate for binary or multi-class classification? What questions would you ask of this dataset?
## [Questionário para fixação](https://white-water-09ec41f0f.azurestaticapps.net/quiz/20?loc=br)
## Revisão e Auto Aprendizagem
Explore SMOTE's API. What use cases is it best used for? What problems does it solve?
## Tarefa
[Explore classification methods](assignment.pt-br.md)

@ -0,0 +1,11 @@
# Explore classification methods
## Instruções
In [Scikit-learn documentation](https://scikit-learn.org/stable/supervised_learning.html) you'll find a large list of ways to classify data. Do a little scavenger hunt in these docs: your goals is to look for classification methods and match a dataset in this curriculum, a question you can ask of it, and a technique of classification. Create a spreadsheet or table in a .doc file and explain how the dataset would work with the classification algorithm.
## Critérios de avaliação
| Critério | Exemplar | Adequado | Precisa melhorar |
| -------- | ----------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| | a document is presented overviewing 5 algorithms alongside a classification technique. The overview is well-explained and detailed. | a document is presented overviewing 3 algorithms alongside a classification technique. The overview is well-explained and detailed. | a document is presented overviewing fewer than three algorithms alongside a classification technique and the overview is neither well-explained nor detailed. |

@ -0,0 +1,243 @@
# Cuisine classifiers 1
In this lesson, you will use the dataset you saved from the last lesson full of balanced, clean data all about cuisines.
You will use this dataset with a variety of classifiers to _predict a given national cuisine based on a group of ingredients_. While doing so, you'll learn more about some of the ways that algorithms can be leveraged for classification tasks.
## [Questionário inicial](https://white-water-09ec41f0f.azurestaticapps.net/quiz/21?loc=br)
# Preparation
Assuming you completed [Lesson 1](../../1-Introduction/README.pt-br.md), make sure that a _cleaned_cuisines.csv_ file exists in the root `/data` folder for these four lessons.
## Exercise - predict a national cuisine
1. Working in this lesson's _notebook.ipynb_ folder, import that file along with the Pandas library:
```python
import pandas as pd
cuisines_df = pd.read_csv("../../data/cleaned_cuisines.csv")
cuisines_df.head()
```
The data looks like this:
| | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
| 0 | 0 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 3 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 4 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
1. Now, import several more libraries:
```python
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
from sklearn.svm import SVC
import numpy as np
```
1. Divide the X and y coordinates into two dataframes for training. `cuisine` can be the labels dataframe:
```python
cuisines_label_df = cuisines_df['cuisine']
cuisines_label_df.head()
```
It will look like this:
```output
0 indian
1 indian
2 indian
3 indian
4 indian
Name: cuisine, dtype: object
```
1. Drop that `Unnamed: 0` column and the `cuisine` column, calling `drop()`. Save the rest of the data as trainable features:
```python
cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
cuisines_feature_df.head()
```
Your features look like this:
| | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
| ---: | -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Now you are ready to train your model!
## Choosing your classifier
Now that your data is clean and ready for training, you have to decide which algorithm to use for the job.
Scikit-learn groups classification under Supervised Learning, and in that category you will find many ways to classify. [The variety](https://scikit-learn.org/stable/supervised_learning.html) is quite bewildering at first sight. The following methods all include classification techniques:
- Linear Models
- Support Vector Machines
- Stochastic Gradient Descent
- Nearest Neighbors
- Gaussian Processes
- Decision Trees
- Ensemble methods (voting Classifier)
- Multiclass and multioutput algorithms (multiclass and multilabel classification, multiclass-multioutput classification)
> You can also use [neural networks to classify data](https://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification), but that is outside the scope of this lesson.
### What classifier to go with?
So, which classifier should you choose? Often, running through several and looking for a good result is a way to test. Scikit-learn offers a [side-by-side comparison](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html) on a created dataset, comparing KNeighbors, SVC two ways, GaussianProcessClassifier, DecisionTreeClassifier, RandomForestClassifier, MLPClassifier, AdaBoostClassifier, GaussianNB and QuadraticDiscrinationAnalysis, showing the results visualized:
![comparison of classifiers](../images/comparison.png)
> Plots generated on Scikit-learn's documentation
> AutoML solves this problem neatly by running these comparisons in the cloud, allowing you to choose the best algorithm for your data. Try it [here](https://docs.microsoft.com/learn/modules/automate-model-selection-with-azure-automl/?WT.mc_id=academic-15963-cxa)
### A better approach
A better way than wildly guessing, however, is to follow the ideas on this downloadable [ML Cheat sheet](https://docs.microsoft.com/azure/machine-learning/algorithm-cheat-sheet?WT.mc_id=academic-15963-cxa). Here, we discover that, for our multiclass problem, we have some choices:
![cheatsheet for multiclass problems](../images/cheatsheet.png)
> A section of Microsoft's Algorithm Cheat Sheet, detailing multiclass classification options
✅ Download this cheat sheet, print it out, and hang it on your wall!
### Reasoning
Let's see if we can reason our way through different approaches given the constraints we have:
- **Neural networks are too heavy**. Given our clean, but minimal dataset, and the fact that we are running training locally via notebooks, neural networks are too heavyweight for this task.
- **No two-class classifier**. We do not use a two-class classifier, so that rules out one-vs-all.
- **Decision tree or logistic regression could work**. A decision tree might work, or logistic regression for multiclass data.
- **Multiclass Boosted Decision Trees solve a different problem**. The multiclass boosted decision tree is most suitable for nonparametric tasks, e.g. tasks designed to build rankings, so it is not useful for us.
### Using Scikit-learn
We will be using Scikit-learn to analyze our data. However, there are many ways to use logistic regression in Scikit-learn. Take a look at the [parameters to pass](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression).
Essentially there are two important parameters `multi_class` and `solver`, that we need to specify, when we ask Scikit-learn to perform a logistic regression. The `multi_class` value applies a certain behavior. The value of the solver is what algorithm to use. Not all solvers can be paired with all `multi_class` values.
According to the docs, in the multiclass case, the training algorithm:
- **Uses the one-vs-rest (OvR) scheme**, if the `multi_class` option is set to `ovr`
- **Uses the cross-entropy loss**, if the `multi_class` option is set to `multinomial`. (Currently the `multinomial` option is supported only by the lbfgs, sag, saga and newton-cg solvers.)"
> 🎓 The 'scheme' here can either be 'ovr' (one-vs-rest) or 'multinomial'. Since logistic regression is really designed to support binary classification, these schemes allow it to better handle multiclass classification tasks. [source](https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/)
> 🎓 The 'solver' is defined as "the algorithm to use in the optimization problem". [source](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression).
Scikit-learn offers this table to explain how solvers handle different challenges presented by different kinds of data structures:
![solvers](../images/solvers.png)
## Exercise - split the data
We can focus on logistic regression for our first training trial since you recently learned about the latter in a previous lesson.
Split your data into training and testing groups by calling `train_test_split()`:
```python
X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
```
## Exercise - apply logistic regression
Since you are using the multiclass case, you need to choose what _scheme_ to use and what _solver_ to set. Use LogisticRegression with a multiclass setting and the **liblinear** solver to train.
1. Create a logistic regression with multi_class set to `ovr` and the solver set to `liblinear`:
```python
lr = LogisticRegression(multi_class='ovr',solver='liblinear')
model = lr.fit(X_train, np.ravel(y_train))
accuracy = model.score(X_test, y_test)
print ("Accuracy is {}".format(accuracy))
```
✅ Try a different solver like `lbfgs`, which is often set as default
> Note, use Pandas [`ravel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ravel.html) function to flatten your data when needed.
The accuracy is good at over **80%**!
1. You can see this model in action by testing one row of data (#50):
```python
print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
print(f'cuisine: {y_test.iloc[50]}')
```
The result is printed:
```output
ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')
cuisine: indian
```
✅ Try a different row number and check the results
1. Digging deeper, you can check for the accuracy of this prediction:
```python
test= X_test.iloc[50].values.reshape(-1, 1).T
proba = model.predict_proba(test)
classes = model.classes_
resultdf = pd.DataFrame(data=proba, columns=classes)
topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
topPrediction.head()
```
The result is printed - Indian cuisine is its best guess, with good probability:
| | 0 |
| -------: | -------: |
| indian | 0.715851 |
| chinese | 0.229475 |
| japanese | 0.029763 |
| korean | 0.017277 |
| thai | 0.007634 |
✅ Can you explain why the model is pretty sure this is an Indian cuisine?
1. Get more detail by printing a classification report, as you did in the regression lessons:
```python
y_pred = model.predict(X_test)
print(classification_report(y_test,y_pred))
```
| | precision | recall | f1-score | support |
| ------------ | --------- | ------ | -------- | ------- |
| chinese | 0.73 | 0.71 | 0.72 | 229 |
| indian | 0.91 | 0.93 | 0.92 | 254 |
| japanese | 0.70 | 0.75 | 0.72 | 220 |
| korean | 0.86 | 0.76 | 0.81 | 242 |
| thai | 0.79 | 0.85 | 0.82 | 254 |
| accuracy | 0.80 | 1199 | | |
| macro avg | 0.80 | 0.80 | 0.80 | 1199 |
| weighted avg | 0.80 | 0.80 | 0.80 | 1199 |
## 🚀Desafio
In this lesson, you used your cleaned data to build a machine learning model that can predict a national cuisine based on a series of ingredients. Take some time to read through the many options Scikit-learn provides to classify data. Dig deeper into the concept of 'solver' to understand what goes on behind the scenes.
## [Questionário para fixação](https://white-water-09ec41f0f.azurestaticapps.net/quiz/22?loc=br)
## Revisão e Auto Aprendizagem
Dig a little more into the math behind logistic regression in [this lesson](https://people.eecs.berkeley.edu/~russell/classes/cs194/f11/lectures/CS194%20Fall%202011%20Lecture%2006.pdf)
## Tarefa
[Study the solvers](assignment.pt-br.md).

@ -0,0 +1,11 @@
# Study the solvers
## Instruções
In this lesson you learned about the various solvers that pair algorithms with a machine learning process to create an accurate model. Walk through the solvers listed in the lesson and pick two. In your own words, compare and contrast these two solvers. What kind of problem do they address? How do they work with various data structures? Why would you pick one over another?
## Critérios de avaliação
| Critério | Exemplar | Adequado | Precisa melhorar |
| -------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------ | ---------------------------- |
| | A .doc file is presented with two paragraphs, one on each solver, comparing them thoughtfully. | A .doc file is presented with only one paragraph | The assignment is incomplete |

@ -0,0 +1,235 @@
# Cuisine classifiers 2
In this second classification lesson, you will explore more ways to classify numeric data. You will also learn about the ramifications for choosing one classifier over the other.
## [Questionário inicial](https://white-water-09ec41f0f.azurestaticapps.net/quiz/23?loc=br)
### Prerequisite
We assume that you have completed the previous lessons and have a cleaned dataset in your `data` folder called _cleaned_cuisines.csv_ in the root of this 4-lesson folder.
### Preparation
We have loaded your _notebook.ipynb_ file with the cleaned dataset and have divided it into X and y dataframes, ready for the model building process.
## A classification map
Previously, you learned about the various options you have when classifying data using Microsoft's cheat sheet. Scikit-learn offers a similar, but more granular cheat sheet that can further help narrow down your estimators (another term for classifiers):
![ML Map from Scikit-learn](../images/map.png)
> Tip: [visit this map online](https://scikit-learn.org/stable/tutorial/machine_learning_map/) and click along the path to read documentation.
### The plan
This map is very helpful once you have a clear grasp of your data, as you can 'walk' along its paths to a decision:
- We have >50 samples
- We want to predict a category
- We have labeled data
- We have fewer than 100K samples
- ✨ We can choose a Linear SVC
- If that doesn't work, since we have numeric data
- We can try a ✨ KNeighbors Classifier
- If that doesn't work, try ✨ SVC and ✨ Ensemble Classifiers
This is a very helpful trail to follow.
## Exercise - split the data
Following this path, we should start by importing some libraries to use.
1. Import the needed libraries:
```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
import numpy as np
```
1. Split your training and test data:
```python
X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
```
## Linear SVC classifier
Support-Vector clustering (SVC) is a child of the Support-Vector machines family of ML techniques (learn more about these below). In this method, you can choose a 'kernel' to decide how to cluster the labels. The 'C' parameter refers to 'regularization' which regulates the influence of parameters. The kernel can be one of [several](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC); here we set it to 'linear' to ensure that we leverage linear SVC. Probability defaults to 'false'; here we set it to 'true' to gather probability estimates. We set the random state to '0' to shuffle the data to get probabilities.
### Exercise - apply a linear SVC
Start by creating an array of classifiers. You will add progressively to this array as we test.
1. Start with a Linear SVC:
```python
C = 10
# Create different classifiers.
classifiers = {
'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0)
}
```
2. Train your model using the Linear SVC and print out a report:
```python
n_classifiers = len(classifiers)
for index, (name, classifier) in enumerate(classifiers.items()):
classifier.fit(X_train, np.ravel(y_train))
y_pred = classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy (train) for %s: %0.1f%% " % (name, accuracy * 100))
print(classification_report(y_test,y_pred))
```
The result is pretty good:
```output
Accuracy (train) for Linear SVC: 78.6%
precision recall f1-score support
chinese 0.71 0.67 0.69 242
indian 0.88 0.86 0.87 234
japanese 0.79 0.74 0.76 254
korean 0.85 0.81 0.83 242
thai 0.71 0.86 0.78 227
accuracy 0.79 1199
macro avg 0.79 0.79 0.79 1199
weighted avg 0.79 0.79 0.79 1199
```
## K-Neighbors classifier
K-Neighbors is part of the "neighbors" family of ML methods, which can be used for both supervised and unsupervised learning. In this method, a predefined number of points is created and data are gathered around these points such that generalized labels can be predicted for the data.
### Exercise - apply the K-Neighbors classifier
The previous classifier was good, and worked well with the data, but maybe we can get better accuracy. Try a K-Neighbors classifier.
1. Add a line to your classifier array (add a comma after the Linear SVC item):
```python
'KNN classifier': KNeighborsClassifier(C),
```
The result is a little worse:
```output
Accuracy (train) for KNN classifier: 73.8%
precision recall f1-score support
chinese 0.64 0.67 0.66 242
indian 0.86 0.78 0.82 234
japanese 0.66 0.83 0.74 254
korean 0.94 0.58 0.72 242
thai 0.71 0.82 0.76 227
accuracy 0.74 1199
macro avg 0.76 0.74 0.74 1199
weighted avg 0.76 0.74 0.74 1199
```
✅ Learn about [K-Neighbors](https://scikit-learn.org/stable/modules/neighbors.html#neighbors)
## Support Vector Classifier
Support-Vector classifiers are part of the [Support-Vector Machine](https://wikipedia.org/wiki/Support-vector_machine) family of ML methods that are used for classification and regression tasks. SVMs "map training examples to points in space" to maximize the distance between two categories. Subsequent data is mapped into this space so their category can be predicted.
### Exercise - apply a Support Vector Classifier
Let's try for a little better accuracy with a Support Vector Classifier.
1. Add a comma after the K-Neighbors item, and then add this line:
```python
'SVC': SVC(),
```
The result is quite good!
```output
Accuracy (train) for SVC: 83.2%
precision recall f1-score support
chinese 0.79 0.74 0.76 242
indian 0.88 0.90 0.89 234
japanese 0.87 0.81 0.84 254
korean 0.91 0.82 0.86 242
thai 0.74 0.90 0.81 227
accuracy 0.83 1199
macro avg 0.84 0.83 0.83 1199
weighted avg 0.84 0.83 0.83 1199
```
✅ Learn about [Support-Vectors](https://scikit-learn.org/stable/modules/svm.html#svm)
## Ensemble Classifiers
Let's follow the path to the very end, even though the previous test was quite good. Let's try some 'Ensemble Classifiers, specifically Random Forest and AdaBoost:
```python
'RFST': RandomForestClassifier(n_estimators=100),
'ADA': AdaBoostClassifier(n_estimators=100)
```
The result is very good, especially for Random Forest:
```output
Accuracy (train) for RFST: 84.5%
precision recall f1-score support
chinese 0.80 0.77 0.78 242
indian 0.89 0.92 0.90 234
japanese 0.86 0.84 0.85 254
korean 0.88 0.83 0.85 242
thai 0.80 0.87 0.83 227
accuracy 0.84 1199
macro avg 0.85 0.85 0.84 1199
weighted avg 0.85 0.84 0.84 1199
Accuracy (train) for ADA: 72.4%
precision recall f1-score support
chinese 0.64 0.49 0.56 242
indian 0.91 0.83 0.87 234
japanese 0.68 0.69 0.69 254
korean 0.73 0.79 0.76 242
thai 0.67 0.83 0.74 227
accuracy 0.72 1199
macro avg 0.73 0.73 0.72 1199
weighted avg 0.73 0.72 0.72 1199
```
✅ Learn about [Ensemble Classifiers](https://scikit-learn.org/stable/modules/ensemble.html)
This method of Machine Learning "combines the predictions of several base estimators" to improve the model's quality. In our example, we used Random Trees and AdaBoost.
- [Random Forest](https://scikit-learn.org/stable/modules/ensemble.html#forest), an averaging method, builds a 'forest' of 'decision trees' infused with randomness to avoid overfitting. The n_estimators parameter is set to the number of trees.
- [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) fits a classifier to a dataset and then fits copies of that classifier to the same dataset. It focuses on the weights of incorrectly classified items and adjusts the fit for the next classifier to correct.
---
## 🚀Desafio
Each of these techniques has a large number of parameters that you can tweak. Research each one's default parameters and think about what tweaking these parameters would mean for the model's quality.
## [Questionário para fixação](https://white-water-09ec41f0f.azurestaticapps.net/quiz/24?loc=br)
## Revisão e Auto Aprendizagem
There's a lot of jargon in these lessons, so take a minute to review [this list](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-15963-cxa) of useful terminology!
## Tarefa
[Parameter play](assignment.pt-br.md).

@ -0,0 +1,11 @@
# Parameter Play
## Instruções
There are a lot of parameters that are set by default when working with these classifiers. Intellisense in VS Code can help you dig into them. Adopt one of the ML Classification Techniques in this lesson and retrain models tweaking various parameter values. Build a notebook explaining why some changes help the model quality while others degrade it. Be detailed in your answer.
## Critérios de avaliação
| Critério | Exemplar | Adequado | Precisa melhorar |
| -------- | ---------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | ----------------------------- |
| | A notebook is presented with a classifier fully built up and its parameters tweaked and changes explained in textboxes | A notebook is partially presented or poorly explained | A notebook is buggy or flawed |

@ -0,0 +1,337 @@
# Build a Cuisine Recommender Web App
In this lesson, you will build a classification model using some of the techniques you have learned in previous lessons and with the delicious cuisine dataset used throughout this series. In addition, you will build a small web app to use a saved model, leveraging Onnx's web runtime.
One of the most useful practical uses of machine learning is building recommendation systems, and you can take the first step in that direction today!
[![Recommendation Systems Introduction](https://img.youtube.com/vi/giIXNoiqO_U/0.jpg)](https://youtu.be/giIXNoiqO_U "Recommendation Systems Introduction")
> 🎥 Click the image above for a video: Andrew Ng introduces recommendation system design
## [Questionário inicial](https://white-water-09ec41f0f.azurestaticapps.net/quiz/25?loc=br)
In this lesson you will learn:
- How to build a model and save it as an Onnx model
- How to use Netron to inspect the model
- How to use your model in a web app for inference
## Build your model
Building applied ML systems is an important part of leveraging these technologies for your business systems. You can use models within your web applications (and thus use them in an offline context if needed) by using Onnx.
In a [previous lesson](../../3-Web-App/1-Web-App/README.md), you built a Regression model about UFO sightings, "pickled" it, and used it in a Flask app. While this architecture is very useful to know, it is a full-stack Python app, and your requirements may include the use of a JavaScript application.
In this lesson, you can build a basic JavaScript-based system for inference. First, however, you need to train a model and convert it for use with Onnx.
## Exercise - train classification model
First, train a classification model using the cleaned cuisines dataset we used.
1. Start by importing useful libraries:
```python
!pip install skl2onnx
import pandas as pd
```
You need '[skl2onnx](https://onnx.ai/sklearn-onnx/)' to help convert your Scikit-learn model to Onnx format.
1. Then, work with your data in the same way you did in previous lessons, by reading a CSV file using `read_csv()`:
```python
data = pd.read_csv('../data/cleaned_cuisines.csv')
data.head()
```
1. Remove the first two unnecessary columns and save the remaining data as 'X':
```python
X = data.iloc[:,2:]
X.head()
```
1. Save the labels as 'y':
```python
y = data[['cuisine']]
y.head()
```
### Commence the training routine
We will use the 'SVC' library which has good accuracy.
1. Import the appropriate libraries from Scikit-learn:
```python
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report
```
1. Separate training and test sets:
```python
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
```
1. Build an SVC Classification model as you did in the previous lesson:
```python
model = SVC(kernel='linear', C=10, probability=True,random_state=0)
model.fit(X_train,y_train.values.ravel())
```
1. Now, test your model, calling `predict()`:
```python
y_pred = model.predict(X_test)
```
1. Print out a classification report to check the model's quality:
```python
print(classification_report(y_test,y_pred))
```
As we saw before, the accuracy is good:
```output
precision recall f1-score support
chinese 0.72 0.69 0.70 257
indian 0.91 0.87 0.89 243
japanese 0.79 0.77 0.78 239
korean 0.83 0.79 0.81 236
thai 0.72 0.84 0.78 224
accuracy 0.79 1199
macro avg 0.79 0.79 0.79 1199
weighted avg 0.79 0.79 0.79 1199
```
### Convert your model to Onnx
Make sure to do the conversion with the proper Tensor number. This dataset has 380 ingredients listed, so you need to notate that number in `FloatTensorType`:
1. Convert using a tensor number of 380.
```python
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
initial_type = [('float_input', FloatTensorType([None, 380]))]
options = {id(model): {'nocl': True, 'zipmap': False}}
```
1. Create the onx and store as a file **model.onnx**:
```python
onx = convert_sklearn(model, initial_types=initial_type, options=options)
with open("./model.onnx", "wb") as f:
f.write(onx.SerializeToString())
```
> Note, you can pass in [options](https://onnx.ai/sklearn-onnx/parameterized.html) in your conversion script. In this case, we passed in 'nocl' to be True and 'zipmap' to be False. Since this is a classification model, you have the option to remove ZipMap which produces a list of dictionaries (not necessary). `nocl` refers to class information being included in the model. Reduce your model's size by setting `nocl` to 'True'.
Running the entire notebook will now build an Onnx model and save it to this folder.
## View your model
Onnx models are not very visible in Visual Studio code, but there's a very good free software that many researchers use to visualize the model to ensure that it is properly built. Download [Netron](https://github.com/lutzroeder/Netron) and open your model.onnx file. You can see your simple model visualized, with its 380 inputs and classifier listed:
![Netron visual](../images/netron.png)
Netron is a helpful tool to view your models.
Now you are ready to use this neat model in a web app. Let's build an app that will come in handy when you look in your refrigerator and try to figure out which combination of your leftover ingredients you can use to cook a given cuisine, as determined by your model.
## Build a recommender web application
You can use your model directly in a web app. This architecture also allows you to run it locally and even offline if needed. Start by creating an `index.html` file in the same folder where you stored your `model.onnx` file.
1. In this file _index.html_, add the following markup:
```html
<!DOCTYPE html>
<html>
<header>
<title>Cuisine Matcher</title>
</header>
<body>
...
</body>
</html>
```
1. Now, working within the `body` tags, add a little markup to show a list of checkboxes reflecting some ingredients:
```html
<h1>Check your refrigerator. What can you create?</h1>
<div id="wrapper">
<div class="boxCont">
<input type="checkbox" value="4" class="checkbox">
<label>apple</label>
</div>
<div class="boxCont">
<input type="checkbox" value="247" class="checkbox">
<label>pear</label>
</div>
<div class="boxCont">
<input type="checkbox" value="77" class="checkbox">
<label>cherry</label>
</div>
<div class="boxCont">
<input type="checkbox" value="126" class="checkbox">
<label>fenugreek</label>
</div>
<div class="boxCont">
<input type="checkbox" value="302" class="checkbox">
<label>sake</label>
</div>
<div class="boxCont">
<input type="checkbox" value="327" class="checkbox">
<label>soy sauce</label>
</div>
<div class="boxCont">
<input type="checkbox" value="112" class="checkbox">
<label>cumin</label>
</div>
</div>
<div style="padding-top:10px">
<button onClick="startInference()">What kind of cuisine can you make?</button>
</div>
```
Notice that each checkbox is given a value. This reflects the index where the ingredient is found according to the dataset. Apple, for example, in this alphabetic list, occupies the fifth column, so its value is '4' since we start counting at 0. You can consult the [ingredients spreadsheet](../data/ingredient_indexes.csv) to discover a given ingredient's index.
Continuing your work in the index.html file, add a script block where the model is called after the final closing `</div>`.
1. First, import the [Onnx Runtime](https://www.onnxruntime.ai/):
```html
<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web@1.9.09/dist/ort.min.js"></script>
```
> Onnx Runtime is used to enable running your Onnx models across a wide range of hardware platforms, including optimizations and an API to use.
1. Once the Runtime is in place, you can call it:
```javascript
<script>
const ingredients = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
const checks = [].slice.call(document.querySelectorAll('.checkbox'));
// use an async context to call onnxruntime functions.
function init() {
checks.forEach(function (checkbox, index) {
checkbox.onchange = function () {
if (this.checked) {
var index = checkbox.value;
if (index !== -1) {
ingredients[index] = 1;
}
console.log(ingredients)
}
else {
var index = checkbox.value;
if (index !== -1) {
ingredients[index] = 0;
}
console.log(ingredients)
}
}
})
}
function testCheckboxes() {
for (var i = 0; i < checks.length; i++)
if (checks[i].type == "checkbox")
if (checks[i].checked)
return true;
return false;
}
async function startInference() {
let checked = testCheckboxes()
if (checked) {
try {
// create a new session and load the model.
const session = await ort.InferenceSession.create('./model.onnx');
const input = new ort.Tensor(new Float32Array(ingredients), [1, 380]);
const feeds = { float_input: input };
// feed inputs and run
const results = await session.run(feeds);
// read from results
alert('You can enjoy ' + results.label.data[0] + ' cuisine today!')
} catch (e) {
console.log(`failed to inference ONNX model: ${e}.`);
}
}
else alert("Please check an ingredient")
}
init();
</script>
```
In this code, there are several things happening:
1. You created an array of 380 possible values (1 or 0) to be set and sent to the model for inference, depending on whether an ingredient checkbox is checked.
2. You created an array of checkboxes and a way to determine whether they were checked in an `init` function that is called when the application starts. When a checkbox is checked, the `ingredients` array is altered to reflect the chosen ingredient.
3. You created a `testCheckboxes` function that checks whether any checkbox was checked.
4. You use that function when the button is pressed and, if any checkbox is checked, you start inference.
5. The inference routine includes:
1. Setting up an asynchronous load of the model
2. Creating a Tensor structure to send to the model
3. Creating 'feeds' that reflects the `float_input` input that you created when training your model (you can use Netron to verify that name)
4. Sending these 'feeds' to the model and waiting for a response
## Test your application
Open a terminal session in Visual Studio Code in the folder where your index.html file resides. Ensure that you have [http-server](https://www.npmjs.com/package/http-server) installed globally, and type `http-server` at the prompt. A localhost should open and you can view your web app. Check what cuisine is recommended based on various ingredients:
![ingredient web app](../images/web-app.png)
Congratulations, you have created a 'recommendation' web app with a few fields. Take some time to build out this system!
## 🚀Desafio
Your web app is very minimal, so continue to build it out using ingredients and their indexes from the [ingredient_indexes](../data/ingredient_indexes.csv) data. What flavor combinations work to create a given national dish?
## [Questionário para fixação](https://white-water-09ec41f0f.azurestaticapps.net/quiz/26?loc=br)
## Revisão e Auto Aprendizagem
While this lesson just touched on the utility of creating a recommendation system for food ingredients, this area of ML applications is very rich in examples. Read some more about how these systems are built:
- https://www.sciencedirect.com/topics/computer-science/recommendation-engine
- https://www.technologyreview.com/2014/08/25/171547/the-ultimate-challenge-for-recommendation-engines/
- https://www.technologyreview.com/2015/03/23/168831/everything-is-a-recommendation/
## Tarefa
[Build a new recommender](assignment.pt-br.md).

@ -0,0 +1,11 @@
# Build a recommender
## Instruções
Given your exercises in this lesson, you now know how to build JavaScript-based web app using Onnx Runtime and a converted Onnx model. Experiment with building a new recommender using data from these lessons or sourced elsewhere (give credit, please). You might create a pet recommender given various personality attributes, or a music genre recommender based on a person's mood. Be creative!
## Critérios de avaliação
| Critério | Exemplar | Adequado | Precisa melhorar |
| -------- | ---------------------------------------------------------------------- | ------------------------------------- | --------------------------------- |
| | A web app and notebook are presented, both well documented and running | One of those two is missing or flawed | Both are either missing or flawed |

@ -0,0 +1,27 @@
# Getting started with classification
## Regional topic: Delicious Asian and Indian Cuisines 🍜
In Asia and India, food traditions are extremely diverse, and very delicious! Let's look at data about regional cuisines to try to understand their ingredients.
![Thai food seller](../images/thai-food.jpg)
> Photo by <a href="https://unsplash.com/@changlisheng?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Lisheng Chang</a> on <a href="https://unsplash.com/s/photos/asian-food?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
## What you will learn
In this section, you will build on the skills you learned in the first part of this curriculum all about regression to learn about other classifiers you can use that will help you learn about your data.
> There are useful low-code tools that can help you learn about working with classification models. Try [Azure ML for this task](https://docs.microsoft.com/learn/modules/create-classification-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
## Lessons
1. [Introduction to classification](../1-Introduction/README.pt-br.md)
2. [More classifiers](../2-Classifiers-1/README.pt-br.md)
3. [Yet other classifiers](../3-Classifiers-2/README.pt-br.md)
4. [Applied ML: build a web app](../4-Applied/README.pt-br.md)
## Créditos
"Getting started with classification" was written with ♥️ by [Cassie Breviu](https://www.twitter.com/cassieview) and [Jen Looper](https://www.twitter.com/jenlooper)
The delicious cuisines dataset was sourced from [Kaggle](https://www.kaggle.com/hoandan/asian-and-indian-cuisines).
Loading…
Cancel
Save