@ -13,6 +13,10 @@ Now you are ready to dive deeper into regression for ML. While visualization all
In this lesson, you will learn more about two types of regression: _basic linear regression_ and _polynomial regression_, along with some of the math underlying these techniques. Those models will allow us to predict pumpkin prices depending on different input data.
In this lesson, you will learn more about two types of regression: _basic linear regression_ and _polynomial regression_, along with some of the math underlying these techniques. Those models will allow us to predict pumpkin prices depending on different input data.
[![ML for beginners - Understanding Linear Regression](https://img.youtube.com/vi/CRxFT8oTDMg/0.jpg)](https://youtu.be/CRxFT8oTDMg "ML for beginners - Understanding Linear Regression")
> 🎥 Click the image above for a short video overview of linear regression.
> Throughout this curriculum, we assume minimal knowledge of math, and seek to make it accessible for students coming from other fields, so watch for notes, 🧮 callouts, diagrams, and other learning tools to aid in comprehension.
> Throughout this curriculum, we assume minimal knowledge of math, and seek to make it accessible for students coming from other fields, so watch for notes, 🧮 callouts, diagrams, and other learning tools to aid in comprehension.
### Prerequisite
### Prerequisite
@ -95,6 +99,10 @@ Now that you have an understanding of the math behind linear regression, let's c
## Looking for Correlation
## Looking for Correlation
[![ML for beginners - Looking for Correlation: The Key to Linear Regression](https://img.youtube.com/vi/uoRq-lW2eQo/0.jpg)](https://youtu.be/uoRq-lW2eQo "ML for beginners - Looking for Correlation: The Key to Linear Regression")
> 🎥 Click the image above for a short video overview of correlation.
From the previous lesson you have probably seen that the average price for different months looks like this:
From the previous lesson you have probably seen that the average price for different months looks like this:
<imgalt="Average price by month"src="../2-Data/images/barchart.png"width="50%"/>
<imgalt="Average price by month"src="../2-Data/images/barchart.png"width="50%"/>
@ -151,6 +159,10 @@ Another approach would be to fill those empty values with mean values from the c
## Simple Linear Regression
## Simple Linear Regression
[![ML for beginners - Linear and Polynomial Regression using Scikit-learn](https://img.youtube.com/vi/e4c_UP2fSjg/0.jpg)](https://youtu.be/e4c_UP2fSjg "ML for beginners - Linear and Polynomial Regression using Scikit-learn")
> 🎥 Click the image above for a short video overview of linear and polynomial regression.
To train our Linear Regression model, we will use the **Scikit-learn** library.
To train our Linear Regression model, we will use the **Scikit-learn** library.
Another type of Linear Regression is Polynomial Regression. While sometimes there's a linear relationship between variables - the bigger the pumpkin in volume, the higher the price - sometimes these relationships can't be plotted as a plane or straight line.
Another type of Linear Regression is Polynomial Regression. While sometimes there's a linear relationship between variables - the bigger the pumpkin in volume, the higher the price - sometimes these relationships can't be plotted as a plane or straight line.
@ -249,6 +260,10 @@ Using Polynomial Regression, we can get slightly lower MSE and higher determinat
In the ideal world, we want to be able to predict prices for different pumpkin varieties using the same model. However, the `Variety` column is somewhat different from columns like `Month`, because it contains non-numeric values. Such columns are called **categorical**.
In the ideal world, we want to be able to predict prices for different pumpkin varieties using the same model. However, the `Variety` column is somewhat different from columns like `Month`, because it contains non-numeric values. Such columns are called **categorical**.
[![ML for beginners - Categorical Feature Predictions with Linear Regression](https://img.youtube.com/vi/DYGliioIAE0/0.jpg)](https://youtu.be/DYGliioIAE0 "ML for beginners - Categorical Feature Predictions with Linear Regression")
> 🎥 Click the image above for a short video overview of using categorical features.
Here you can see how average price depends on variety:
Here you can see how average price depends on variety:
<imgalt="Average price by variety"src="images/price-by-variety.png"width="50%"/>
<imgalt="Average price by variety"src="images/price-by-variety.png"width="50%"/>
✅ Deepen your understanding of working with this type of regression in this [Learn module](https://docs.microsoft.com/learn/modules/train-evaluate-classification-models?WT.mc_id=academic-77952-leestott)
✅ Deepen your understanding of working with this type of regression in this [Learn module](https://docs.microsoft.com/learn/modules/train-evaluate-classification-models?WT.mc_id=academic-77952-leestott)
## Prerequisite
## Prerequisite
Having worked with the pumpkin data, we are now familiar enough with it to realize that there's one binary category that we can work with: `Color`.
Having worked with the pumpkin data, we are now familiar enough with it to realize that there's one binary category that we can work with: `Color`.
@ -34,12 +35,17 @@ For our purposes, we will express this as a binary: 'White' or 'Not White'. Ther
Logistic regression differs from linear regression, which you learned about previously, in a few important ways.
Logistic regression differs from linear regression, which you learned about previously, in a few important ways.
[![ML for beginners - Logistic Regression for classification of data](https://img.youtube.com/vi/MmZS2otPrQ8/0.jpg)](https://youtu.be/MmZS2otPrQ8 "ML for beginners - Logistic Regression for classification of data")
> 🎥 Click the image above for a short video overview of logistic regression.
### Binary classification
### Binary classification
Logistic regression does not offer the same features as linear regression. The former offers a prediction about a binary category ("orange or not orange") whereas the latter is capable of predicting continual values, for example given the origin of a pumpkin and the time of harvest, _how much its price will rise_.
Logistic regression does not offer the same features as linear regression. The former offers a prediction about a binary category ("orange or not orange") whereas the latter is capable of predicting continual values, for example given the origin of a pumpkin and the time of harvest, _how much its price will rise_.
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
### Other classifications
### Other classifications
There are other types of logistic regression, including multinomial and ordinal:
There are other types of logistic regression, including multinomial and ordinal:
@ -57,6 +63,10 @@ Remember how linear regression worked better with more correlated variables? Log
Logistic regression will give more accurate results if you use more data; our small dataset is not optimal for this task, so keep that in mind.
Logistic regression will give more accurate results if you use more data; our small dataset is not optimal for this task, so keep that in mind.
[![ML for beginners - Data Analysis and Preparation for Logistic Regression](https://img.youtube.com/vi/B2X4H9vcXTs/0.jpg)](https://youtu.be/B2X4H9vcXTs "ML for beginners - Data Analysis and Preparation for Logistic Regression")
> 🎥 Click the image above for a short video overview of preparing data for linear regression
✅ Think about the types of data that would lend themselves well to logistic regression
✅ Think about the types of data that would lend themselves well to logistic regression
## Exercise - tidy the data
## Exercise - tidy the data
@ -215,6 +225,10 @@ You can visualize variables side-by-side with Seaborn plots.
Building a model to find these binary classification is surprisingly straightforward in Scikit-learn.
Building a model to find these binary classification is surprisingly straightforward in Scikit-learn.
[![ML for beginners - Logistic Regression for classification of data](https://img.youtube.com/vi/MmZS2otPrQ8/0.jpg)](https://youtu.be/MmZS2otPrQ8 "ML for beginners - Logistic Regression for classification of data")
> 🎥 Click the image above for a short video overview of building a linear regression model
1. Select the variables you want to use in your classification model and split the training and test sets calling `train_test_split()`:
1. Select the variables you want to use in your classification model and split the training and test sets calling `train_test_split()`:
```python
```python
@ -327,6 +341,10 @@ Let's revisit the terms we saw earlier with the help of the confusion matrix's m
## Visualize the ROC curve of this model
## Visualize the ROC curve of this model
[![ML for beginners - Analyzing Logistic Regression Performance with ROC Curves](https://img.youtube.com/vi/GApO575jTA0/0.jpg)](https://youtu.be/GApO575jTA0 "ML for beginners - Analyzing Logistic Regression Performance with ROC Curves")
> 🎥 Click the image above for a short video overview of ROC curves
Let's do one more visualization to see the so-called 'ROC' curve:
Let's do one more visualization to see the so-called 'ROC' curve:
Using Matplotlib, plot the model's [Receiving Operating Characteristic](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html?highlight=roc) or ROC. ROC curves are often used to get a view of the output of a classifier in terms of its true vs. false positives. "ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis." Thus, the steepness of the curve and the space between the midpoint line and the curve matter: you want a curve that quickly heads up and over the line. In our case, there are false positives to start with, and then the line heads up and over properly:
Using Matplotlib, plot the model's [Receiving Operating Characteristic](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html?highlight=roc) or ROC. ROC curves are often used to get a view of the output of a classifier in terms of its true vs. false positives. "ROC curves typically feature true positive rate on the Y axis, and false positive rate on the X axis." Thus, the steepness of the curve and the space between the midpoint line and the curve matter: you want a curve that quickly heads up and over the line. In our case, there are false positives to start with, and then the line heads up and over properly: