Now that you are set up with the tools you need to start tackling machine learning model-building with Scikit-Learn, you are ready to start asking questions of your data. As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.
In this lesson, you will learn:
In this lesson, you will learn:
- Preparing your data for model-building
- Preparing your data for model-building
- Two data visualization techniques and libraries
- Using Matplotlib for data visualization
### Asking the Right Question
### Asking the Right Question
As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.
The question you need answered will determine what type of ML algorithms you will leverage. For example, do you need to determine the differences between cars and trucks as they cruise down a highway via a video feed? You will need some kind of highly performant classification model to make that differentiation. It will need to be able to perform object detection, probably by showing bounding boxes around detected cars and trucks.
The question you need answered will determine what type of ML algorithms you will leverage. For example, do you need to determine the differences between cars and trucks as they cruise down a highway via a video feed? You will need some kind of highly performant classification model to make that differentiation. It will need to be able to perform object detection, probably by showing bounding boxes around detected cars and trucks.
In this final lesson on Regression, one of the basic 'classic' ML techniques, we will take a look at Logistic Regression. You would use this technique to discover patterns to predict categories.
> Notes
In this lesson, you will learn:
- A new library for data visualization
- Techniques for Logistic Regression
## Prerequisite
### Prerequisite
Having worked with the pumpkin data, we are now familiar enough with it to realize that there's one small category that we can work with: Color. Let's build a Logistic Regression model to predict that, given a pumpkin's size, what color it will be (orange or white). There is also a 'striped' category in our dataset but there are few instances, so we will not use it.
What steps should have been covered before this lesson?
> 🎃 Fun fact, we sometimes call white pumpkins 'ghost' pumpkins. They aren't very easy to carve, so they aren't as popular as the orange ones but they are cool looking!
### Preparation
### Preparation
Preparatory steps to start this lesson
We have loaded up the [starter notebook](./notebook.ipynb) with pumpkin data once again and cleaned it so as to preserve a dataset containing Color and Item Size.
---
[Step through content in blocks]
## [Topic 1]
### Task:
Work together to progressively enhance your codebase to build the project with shared code:
```html
code blocks
```
✅ Knowledge Check - use this moment to stretch students' knowledge with open questions