@ -72,21 +72,6 @@ Logistic regression does not offer the same features as linear regression. The f
![Infographic by Dasani Madipalli](../../images/pumpkin-classifier.png){width="600"}
#### **Other classifications**
There are other types of logistic regression, including multinomial and ordinal:
- **Multinomial**, which involves having more than one category - "Orange, White, and Striped".
- **Ordinal**, which involves ordered categories, useful if we wanted to order our outcomes logically, like our pumpkins that are ordered by a finite number of sizes (mini,sm,med,lg,xl,xxl).
![Multinomial vs ordinal regression](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/4-Logistic/images/multinomial-vs-ordinal.png)
\
**It's still linear**
Even though this type of Regression is all about 'category predictions', it still works best when there is a clear linear relationship between the dependent variable (color) and the other independent variables (the rest of the dataset, like city name and size). It's good to get an idea of whether there is any linearity dividing these variables or not.
#### **Variables DO NOT have to correlate**
Remember how linear regression worked better with more correlated variables? Logistic regression is the opposite - the variables don't have to align. That works for this data which has somewhat weak correlations.
@ -179,36 +164,52 @@ baked_pumpkins %>%
slice_head(n = 5)
```
Now let's compare the feature distributions for each label value using box plots. We'll begin by formatting the data to a *long* format to make it somewhat easier to make multiple `facets`.
Now, let's make a categorical plot showing the distribution of the predictors with respect to the outcome color!
# Specify colors for each value of the hue variable
palette <- c(ORANGE = "orange", WHITE = "wheat")
# Create the bar plot
ggplot(pumpkins_select, aes(y = variety, fill = color)) +
geom_bar(position = "dodge") +
scale_fill_manual(values = palette) +
labs(y = "Variety", fill = "Color") +
theme_minimal()
```
# Print out restructured data
baked_pumpkins_long %>%
slice_head(n = 10)
Amazing🤩! For some of the features, there's a noticeable difference in the distribution for each color label. For instance, it seems the white pumpkins can be found in smaller packages and in some particular varieties of pumpkins. The *item_size* category also seems to make a difference in the color distribution. These features may help predict the color of a pumpkin.
```
### **Analysing relationships between features and label**
```{r}
Now, let's make some boxplots showing the distribution of the predictors with respect to the outcome color!
# Define the color palette
palette <- c(ORANGE = "orange", WHITE = "wheat")
```{r boxplots}
theme_set(theme_light())
#Make a box plot for each predictor feature
baked_pumpkins_long %>%
mutate(color = factor(color)) %>%
ggplot(mapping = aes(x = color, y = values, fill = features)) +
# We need the encoded Item Size column to use it as the x-axis values in the plot
Amazing🤩! For some of the features, there's a noticeable difference in the distribution for each color label. For instance, it seems the white pumpkins can be found in smaller packages and in some particular varieties of pumpkins. The *item_size* category also seems to make a difference in the color distribution. These features may help predict the color of a pumpkin.
Let's now focus on a specific relationship: Item Size and Color!
#### **Use a swarm plot**
@ -227,19 +228,10 @@ baked_pumpkins %>%
scale_color_brewer(palette = "Dark2", direction = -1) +
theme(legend.position = "none")
```
Now that we have an idea of the relationship between the binary categories of color and the larger group of sizes, let's explore logistic regression to determine a given pumpkin's likely color.
## 3. Build your model
> **🧮 Show Me The Math**
>
> Remember how `linear regression` often used `ordinary least squares` to arrive at a value? `Logistic regression` relies on the concept of 'maximum likelihood' using [`sigmoid functions`](https://wikipedia.org/wiki/Sigmoid_function). A Sigmoid Function on a plot looks like an `S shape`. It takes a value and maps it to somewhere between 0 and 1. Its curve is also called a 'logistic curve'. Its formula looks like this:
>
> ![](../../images/sigmoid.png)
>
> where the sigmoid's midpoint finds itself at x's 0 point, L is the curve's maximum value, and k is the curve's steepness. If the outcome of the function is more than 0.5, the label in question will be given the class 1 of the binary choice. If not, it will be classified as 0.
Let's begin by splitting the data into `training` and `test` sets. The training set is used to train a classifier so that it finds a statistical relationship between the features and the label value.
It is best practice to hold out some of your data for **testing** in order to get a better estimate of how your models will perform on new data by comparing the predicted labels with the already known labels in the test set. [rsample](https://rsample.tidymodels.org/), a package in Tidymodels, provides infrastructure for efficient data splitting and resampling:
🙌 We are now ready to train a model by fitting the training features to the training label (color).
@ -294,11 +284,11 @@ log_reg_wf <- workflow() %>%
# Print out the workflow
log_reg_wf
```
After a workflow has been *specified*, a model can be `trained` using the [`fit()`](https://tidymodels.github.io/parsnip/reference/fit.html) function. The workflow will estimate a recipe and preprocess the data before training, so we won't have to manually do that using prep and bake.
```{r train}
# Train the model
wf_fit <- log_reg_wf %>%
@ -307,6 +297,7 @@ wf_fit <- log_reg_wf %>%
# Print the trained workflow
wf_fit
```
The model print out shows the coefficients learned during training.
@ -338,8 +329,6 @@ The [**`conf_mat()`**](https://tidymodels.github.io/yardstick/reference/conf_mat
```{r conf_mat}
# Confusion matrix for prediction results
conf_mat(data = results, truth = color, estimate = .pred_class)
```
Let's interpret the confusion matrix. Our model is asked to classify pumpkins between two binary categories, category `white` and category `not-white`
@ -418,5 +407,3 @@ But for now, congratulations 🎉🎉🎉! You've completed these regression les
You R awesome!
![Artwork by \@allison_horst](../../images/r_learners_sm.jpeg)