ML-For-Beginners/2-Regression/1-Tools/README.md

# Get started with Python and Scikit-learn for regression models

![Summary of regressions in a sketchnote](../../sketchnotes/ml-regression.png)

> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)

## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/9/)

> ### [This lesson is available in R!](./solution/R/lesson_1-R.ipynb)

## Introduction

In these four lessons, you will discover how to build regression models. We will discuss what these are for shortly. But before you do anything, make sure you have the right tools in place to start the process!

In this lesson, you will learn how to:

- Configure your computer for local machine learning tasks.
- Work with Jupyter notebooks.
- Use Scikit-learn, including installation.
- Explore linear regression with a hands-on exercise.

## Installations and configurations

[![ML for beginners - Setup your tools ready to build Machine Learning models](https://img.youtube.com/vi/-DfeD2k2Kj0/0.jpg)](https://youtu.be/-DfeD2k2Kj0 "ML for beginners -Setup your tools ready to build Machine Learning models")

> 🎥 Click the image above for a short video working through configuring your computer for ML.

1. **Install Python**. Ensure that [Python](https://www.python.org/downloads/) is installed on your computer. You will use Python for many data science and machine learning tasks. Most computer systems already include a Python installation. There are useful [Python Coding Packs](https://code.visualstudio.com/learn/educators/installers?WT.mc_id=academic-77952-leestott) available as well, to ease the setup for some users.

   Some usages of Python, however, require one version of the software, whereas others require a different version. For this reason, it's useful to work within a [virtual environment](https://docs.python.org/3/library/venv.html).

2. **Install Visual Studio Code**. Make sure you have Visual Studio Code installed on your computer. Follow these instructions to [install Visual Studio Code](https://code.visualstudio.com/) for the basic installation. You are going to use Python in Visual Studio Code in this course, so you might want to brush up on how to [configure Visual Studio Code](https://docs.microsoft.com/learn/modules/python-install-vscode?WT.mc_id=academic-77952-leestott) for Python development.

   > Get comfortable with Python by working through this collection of [Learn modules](https://docs.microsoft.com/users/jenlooper-2911/collections/mp1pagggd5qrq7?WT.mc_id=academic-77952-leestott)
   >
   > [![Setup Python with Visual Studio Code](https://img.youtube.com/vi/yyQM70vi7V8/0.jpg)](https://youtu.be/yyQM70vi7V8 "Setup Python with Visual Studio Code")
   >
   > 🎥 Click the image above for a video: using Python within VS Code.

3. **Install Scikit-learn**, by following [these instructions](https://scikit-learn.org/stable/install.html). Since you need to ensure that you use Python 3, it's recommended that you use a virtual environment. Note, if you are installing this library on a M1 Mac, there are special instructions on the page linked above.

1. **Install Jupyter Notebook**. You will need to [install the Jupyter package](https://pypi.org/project/jupyter/).

## Your ML authoring environment

You are going to use **notebooks** to develop your Python code and create machine learning models. This type of file is a common tool for data scientists, and they can be identified by their suffix or extension `.ipynb`.

Notebooks are an interactive environment that allow the developer to both code and add notes and write documentation around the code which is quite helpful for experimental or research-oriented projects.

[![ML for beginners - Set up Jupyter Notebooks to start building regression models](https://img.youtube.com/vi/7E-jC8FLA2E/0.jpg)](https://youtu.be/7E-jC8FLA2E "ML for beginners - Set up Jupyter Notebooks to start building regression models")

> 🎥 Click the image above for a short video working through this exercise.

### Exercise - work with a notebook

In this folder, you will find the file _notebook.ipynb_.

1. Open _notebook.ipynb_ in Visual Studio Code.

   A Jupyter server will start with Python 3+ started. You will find areas of the notebook that can be `run`, pieces of code. You can run a code block, by selecting the icon that looks like a play button.

1. Select the `md` icon and add a bit of markdown, and the following text **# Welcome to your notebook**.

   Next, add some Python code.

1. Type **print('hello notebook')** in the code block.
1. Select the arrow to run the code.

   You should see the printed statement:

    ```output
    hello notebook
    ```

![VS Code with a notebook open](images/notebook.jpg)

You can interleaf your code with comments to self-document the notebook.

✅ Think for a minute how different a web developer's working environment is versus that of a data scientist.

## Up and running with Scikit-learn

Now that Python is set up in your local environment, and you are comfortable with Jupyter notebooks, let's get equally comfortable with Scikit-learn (pronounce it `sci` as in `science`). Scikit-learn provides an [extensive API](https://scikit-learn.org/stable/modules/classes.html#api-ref) to help you perform ML tasks.

According to their [website](https://scikit-learn.org/stable/getting_started.html), "Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities."

In this course, you will use Scikit-learn and other tools to build machine learning models to perform what we call 'traditional machine learning' tasks. We have deliberately avoided neural networks and deep learning, as they are better covered in our forthcoming 'AI for Beginners' curriculum.

Scikit-learn makes it straightforward to build models and evaluate them for use. It is primarily focused on using numeric data and contains several ready-made datasets for use as learning tools. It also includes pre-built models for students to try. Let's explore the process of loading prepackaged data and using a built in estimator  first ML model with Scikit-learn with some basic data.

## Exercise - your first Scikit-learn notebook

> This tutorial was inspired by the [linear regression example](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py) on Scikit-learn's web site.


[![ML for beginners - Your First Linear Regression Project in Python](https://img.youtube.com/vi/2xkXL5EUpS0/0.jpg)](https://youtu.be/2xkXL5EUpS0 "ML for beginners - Your First Linear Regression Project in Python")

> 🎥 Click the image above for a short video working through this exercise.

In the _notebook.ipynb_ file associated to this lesson, clear out all the cells by pressing the 'trash can' icon.

In this section, you will work with a small dataset about diabetes that is built into Scikit-learn for learning purposes. Imagine that you wanted to test a treatment for diabetic patients. Machine Learning models might help you determine which patients would respond better to the treatment, based on combinations of variables. Even a very basic regression model, when visualized, might show information about variables that would help you organize your theoretical clinical trials.

✅ There are many types of regression methods, and which one you pick depends on the answer you're looking for. If you want to predict the probable height for a person of a given age, you'd use linear regression, as you're seeking a **numeric value**. If you're interested in discovering whether a type of cuisine should be considered vegan or not, you're looking for a **category assignment** so you would use logistic regression. You'll learn more about logistic regression later. Think a bit about some questions you can ask of data, and which of these methods would be more appropriate.

Let's get started on this task.

### Import libraries

For this task we will import some libraries:

- **matplotlib**. It's a useful [graphing tool](https://matplotlib.org/) and we will use it to create a line plot.
- **numpy**. [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html) is a useful library for handling numeric data in Python.
- **sklearn**. This is the [Scikit-learn](https://scikit-learn.org/stable/user_guide.html) library.

Import some libraries to help with your tasks.

1. Add imports by typing the following code:

   ```python
   import matplotlib.pyplot as plt
   import numpy as np
   from sklearn import datasets, linear_model, model_selection
   ```

   Above you are importing `matplotlib`, `numpy` and you are importing `datasets`, `linear_model` and `model_selection` from `sklearn`. `model_selection` is used for splitting data into training and test sets.

### The diabetes dataset

The built-in [diabetes dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) includes 442 samples of data around diabetes, with 10 feature variables, some of which include:

- age: age in years
- bmi: body mass index
- bp: average blood pressure
- s1 tc: T-Cells (a type of white blood cells)

✅ This dataset includes the concept of 'sex' as a feature variable important to research around diabetes. Many medical datasets include this type of binary classification. Think a bit about how categorizations such as this might exclude certain parts of a population from treatments.

Now, load up the X and y data.

> 🎓 Remember, this is supervised learning, and we need a named 'y' target.

In a new code cell, load the diabetes dataset by calling `load_diabetes()`. The input `return_X_y=True` signals that `X` will be a data matrix, and `y` will be the regression target.

1. Add some print commands to show the shape of the data matrix and its first element:

    ```python
    X, y = datasets.load_diabetes(return_X_y=True)
    print(X.shape)
    print(X[0])
    ```

    What you are getting back as a response, is a tuple. What you are doing is to assign the two first values of the tuple to `X` and `y` respectively. Learn more [about tuples](https://wikipedia.org/wiki/Tuple).

    You can see that this data has 442 items shaped in arrays of 10 elements:

    ```text
    (442, 10)
    [ 0.03807591  0.05068012  0.06169621  0.02187235 -0.0442235  -0.03482076
    -0.04340085 -0.00259226  0.01990842 -0.01764613]
    ```

    ✅ Think a bit about the relationship between the data and the regression target. Linear regression predicts relationships between feature X and target variable y. Can you find the [target](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) for the diabetes dataset in the documentation? What is this dataset demonstrating, given that target?

2. Next, select a portion of this dataset to plot by selecting the 3rd column of the dataset. You can do this by using the `:` operator to select all rows, and then selecting the 3rd column using the index (2). You can also reshape the data to be a 2D array - as required for plotting - by using `reshape(n_rows, n_columns)`. If one of the parameter is -1, the corresponding dimension is calculated automatically.

   ```python
   X = X[:, 2]
   X = X.reshape((-1,1))
   ```

   ✅ At any time, print out the data to check its shape.

3. Now that you have data ready to be plotted, you can see if a machine can help determine a logical split between the numbers in this dataset. To do this, you need to split both the data (X) and the target (y) into test and training sets. Scikit-learn has a straightforward way to do this; you can split your test data at a given point.

   ```python
   X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
   ```

4. Now you are ready to train your model! Load up the linear regression model and train it with your X and y training sets using `model.fit()`:

    ```python
    model = linear_model.LinearRegression()
    model.fit(X_train, y_train)
    ```

    ✅ `model.fit()` is a function you'll see in many ML libraries such as TensorFlow

5. Then, create a prediction using test data, using the function `predict()`. This will be used to draw the line between data groups

    ```python
    y_pred = model.predict(X_test)
    ```

6. Now it's time to show the data in a plot. Matplotlib is a very useful tool for this task. Create a scatterplot of all the X and y test data, and use the prediction to draw a line in the most appropriate place, between the model's data groupings.

    ```python
    plt.scatter(X_test, y_test,  color='black')
    plt.plot(X_test, y_pred, color='blue', linewidth=3)
    plt.xlabel('Scaled BMIs')
    plt.ylabel('Disease Progression')
    plt.title('A Graph Plot Showing Diabetes Progression Against BMI')
    plt.show()
    ```

   ![a scatterplot showing datapoints around diabetes](./images/scatterplot.png)

   ✅ Think a bit about what's going on here. A straight line is running through many small dots of data, but what is it doing exactly? Can you see how you should be able to use this line to predict where a new, unseen data point should fit in relationship to the plot's y axis? Try to put into words the practical use of this model.

Congratulations, you built your first linear regression model, created a prediction with it, and displayed it in a plot!

---
## 🚀Challenge

Plot a different variable from this dataset. Hint: edit this line: `X = X[:,2]`. Given this dataset's target, what are you able to discover about the progression of diabetes as a disease?
## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/10/)

## Review & Self Study

In this tutorial, you worked with simple linear regression, rather than univariate or multiple linear regression. Read a little about the differences between these methods, or take a look at [this video](https://www.coursera.org/lecture/quantifying-relationships-regression-models/linear-vs-nonlinear-categorical-variables-ai2Ef)

Read more about the concept of regression and think about what kinds of questions can be answered by this technique. Take this [tutorial](https://docs.microsoft.com/learn/modules/train-evaluate-regression-models?WT.mc_id=academic-77952-leestott) to deepen your understanding.

## Assignment

[A different dataset](assignment.md)
spelling audit, LR lesson 1 audit 4 years ago			`# Get started with Python and Scikit-learn for regression models`
lessons 4 years ago
moving graphics around 4 years ago			`![Summary of regressions in a sketchnote](../../sketchnotes/ml-regression.png)`

			`> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)`
README standardization 4 years ago
added links to the new quiz apps 2 years ago			`## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/9/)`
Update README.md 3 years ago
editing image paths for R lessons, moving R and linking 3 years ago			`> ### [This lesson is available in R!](./solution/R/lesson_1-R.ipynb)`
Update README.md 3 years ago
edits for lesson 3 4 years ago			`## Introduction`
lessons 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`In these four lessons, you will discover how to build regression models. We will discuss what these are for shortly. But before you do anything, make sure you have the right tools in place to start the process!`
polynomial regression notebook 4 years ago
completing lesson for intro - techniques 4 years ago			`In this lesson, you will learn how to:`
changed structure 4 years ago
			`- Configure your computer for local machine learning tasks.`
completing lesson for intro - techniques 4 years ago			`- Work with Jupyter notebooks.`
spelling audit, LR lesson 1 audit 4 years ago			`- Use Scikit-learn, including installation.`
			`- Explore linear regression with a hands-on exercise.`
changed structure 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`## Installations and configurations`
edits for lesson 3 4 years ago
Adding video links 2 years ago			`[![ML for beginners - Setup your tools ready to build Machine Learning models](https://img.youtube.com/vi/-DfeD2k2Kj0/0.jpg)](https://youtu.be/-DfeD2k2Kj0 "ML for beginners -Setup your tools ready to build Machine Learning models")`

			`> 🎥 Click the image above for a short video working through configuring your computer for ML.`

Update 2 years ago			`1. Install Python. Ensure that [Python](https://www.python.org/downloads/) is installed on your computer. You will use Python for many data science and machine learning tasks. Most computer systems already include a Python installation. There are useful [Python Coding Packs](https://code.visualstudio.com/learn/educators/installers?WT.mc_id=academic-77952-leestott) available as well, to ease the setup for some users.`
changed structure 4 years ago
			`Some usages of Python, however, require one version of the software, whereas others require a different version. For this reason, it's useful to work within a [virtual environment](https://docs.python.org/3/library/venv.html).`
lessons 4 years ago
Update 2 years ago			`2. Install Visual Studio Code. Make sure you have Visual Studio Code installed on your computer. Follow these instructions to [install Visual Studio Code](https://code.visualstudio.com/) for the basic installation. You are going to use Python in Visual Studio Code in this course, so you might want to brush up on how to [configure Visual Studio Code](https://docs.microsoft.com/learn/modules/python-install-vscode?WT.mc_id=academic-77952-leestott) for Python development.`
lessons 4 years ago
Update 2 years ago			`> Get comfortable with Python by working through this collection of [Learn modules](https://docs.microsoft.com/users/jenlooper-2911/collections/mp1pagggd5qrq7?WT.mc_id=academic-77952-leestott)`
Adding next set of ML for beginners videos 2 years ago			`>`
			`> [![Setup Python with Visual Studio Code](https://img.youtube.com/vi/yyQM70vi7V8/0.jpg)](https://youtu.be/yyQM70vi7V8 "Setup Python with Visual Studio Code")`
			`>`
			`> 🎥 Click the image above for a video: using Python within VS Code.`
changed structure 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`3. Install Scikit-learn, by following [these instructions](https://scikit-learn.org/stable/install.html). Since you need to ensure that you use Python 3, it's recommended that you use a virtual environment. Note, if you are installing this library on a M1 Mac, there are special instructions on the page linked above.`
changed structure 4 years ago
use new tooling video (#330) 3 years ago			`1. Install Jupyter Notebook. You will need to [install the Jupyter package](https://pypi.org/project/jupyter/).`
lessons 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`## Your ML authoring environment`
lessons 4 years ago
edits to regression per feedback 4 years ago			You are going to use notebooks to develop your Python code and create machine learning models. This type of file is a common tool for data scientists, and they can be identified by their suffix or extension `.ipynb`.
lessons 4 years ago
edits to regression per feedback 4 years ago			`Notebooks are an interactive environment that allow the developer to both code and add notes and write documentation around the code which is quite helpful for experimental or research-oriented projects.`
edits for lesson 3 4 years ago
Adding next set of ML for beginners videos 2 years ago			`[![ML for beginners - Set up Jupyter Notebooks to start building regression models](https://img.youtube.com/vi/7E-jC8FLA2E/0.jpg)](https://youtu.be/7E-jC8FLA2E "ML for beginners - Set up Jupyter Notebooks to start building regression models")`

			`> 🎥 Click the image above for a short video working through this exercise.`

spelling audit, LR lesson 1 audit 4 years ago			`### Exercise - work with a notebook`
changed structure 4 years ago
use new tooling video (#330) 3 years ago			`In this folder, you will find the file _notebook.ipynb_.`
edits for lesson 3 4 years ago
changed structure 4 years ago			`1. Open _notebook.ipynb_ in Visual Studio Code.`
edits for lesson 3 4 years ago
changed structure 4 years ago			A Jupyter server will start with Python 3+ started. You will find areas of the notebook that can be `run`, pieces of code. You can run a code block, by selecting the icon that looks like a play button.

			1. Select the `md` icon and add a bit of markdown, and the following text # Welcome to your notebook.

use new tooling video (#330) 3 years ago			`Next, add some Python code.`
changed structure 4 years ago
Update 2-Regression/1-Tools/README.md Co-authored-by: Rishit Dagli <rishit.dagli@gmail.com> 4 years ago			`1. Type print('hello notebook') in the code block.`
changed structure 4 years ago			`1. Select the arrow to run the code.`

			`You should see the printed statement:`

			```output
Update 2-Regression/1-Tools/README.md Co-authored-by: Rishit Dagli <rishit.dagli@gmail.com> 4 years ago			`hello notebook`
changed structure 4 years ago			```
edits for lesson 3 4 years ago
notebook image refresh 3 years ago			`![VS Code with a notebook open](images/notebook.jpg)`
edits for lesson 3 4 years ago
			`You can interleaf your code with comments to self-document the notebook.`

			`✅ Think for a minute how different a web developer's working environment is versus that of a data scientist.`

spelling audit, LR lesson 1 audit 4 years ago			`## Up and running with Scikit-learn`
changed structure 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			Now that Python is set up in your local environment, and you are comfortable with Jupyter notebooks, let's get equally comfortable with Scikit-learn (pronounce it `sci` as in `science`). Scikit-learn provides an [extensive API](https://scikit-learn.org/stable/modules/classes.html#api-ref) to help you perform ML tasks.
edits for lesson 3 4 years ago
			`According to their [website](https://scikit-learn.org/stable/getting_started.html), "Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities."`
changed structure 4 years ago
use new tooling video (#330) 3 years ago			`In this course, you will use Scikit-learn and other tools to build machine learning models to perform what we call 'traditional machine learning' tasks. We have deliberately avoided neural networks and deep learning, as they are better covered in our forthcoming 'AI for Beginners' curriculum.`
edits for lesson 3 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`Scikit-learn makes it straightforward to build models and evaluate them for use. It is primarily focused on using numeric data and contains several ready-made datasets for use as learning tools. It also includes pre-built models for students to try. Let's explore the process of loading prepackaged data and using a built in estimator first ML model with Scikit-learn with some basic data.`
lessons 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`## Exercise - your first Scikit-learn notebook`
lesson 1 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`> This tutorial was inspired by the [linear regression example](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py) on Scikit-learn's web site.`
lesson 1 4 years ago
Adding next set of ML for beginners videos 2 years ago
			`[![ML for beginners - Your First Linear Regression Project in Python](https://img.youtube.com/vi/2xkXL5EUpS0/0.jpg)](https://youtu.be/2xkXL5EUpS0 "ML for beginners - Your First Linear Regression Project in Python")`

			`> 🎥 Click the image above for a short video working through this exercise.`

changed structure 4 years ago			`In the _notebook.ipynb_ file associated to this lesson, clear out all the cells by pressing the 'trash can' icon.`
lesson 1 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`In this section, you will work with a small dataset about diabetes that is built into Scikit-learn for learning purposes. Imagine that you wanted to test a treatment for diabetic patients. Machine Learning models might help you determine which patients would respond better to the treatment, based on combinations of variables. Even a very basic regression model, when visualized, might show information about variables that would help you organize your theoretical clinical trials.`
polynomial regression notebook 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			✅ There are many types of regression methods, and which one you pick depends on the answer you're looking for. If you want to predict the probable height for a person of a given age, you'd use linear regression, as you're seeking a numeric value. If you're interested in discovering whether a type of cuisine should be considered vegan or not, you're looking for a category assignment so you would use logistic regression. You'll learn more about logistic regression later. Think a bit about some questions you can ask of data, and which of these methods would be more appropriate.
lesson 1 4 years ago
			`Let's get started on this task.`

changed structure 4 years ago			`### Import libraries`

			`For this task we will import some libraries:`

			`- matplotlib. It's a useful [graphing tool](https://matplotlib.org/) and we will use it to create a line plot.`
			`- numpy. [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html) is a useful library for handling numeric data in Python.`
Update README.md 3 years ago			`- sklearn. This is the [Scikit-learn](https://scikit-learn.org/stable/user_guide.html) library.`
changed structure 4 years ago
			`Import some libraries to help with your tasks.`
lesson 1 4 years ago
changed structure 4 years ago			`1. Add imports by typing the following code:`
lesson 1 4 years ago
changed structure 4 years ago			```python
			`import matplotlib.pyplot as plt`
			`import numpy as np`
			`from sklearn import datasets, linear_model, model_selection`
			```

Docs : Fixed some typos (#388) * Docs : Fixed some typos * 01/02 requested changes made * 02/02 requested changes made 3 years ago			Above you are importing `matplotlib`, `numpy` and you are importing `datasets`, `linear_model` and `model_selection` from `sklearn`. `model_selection` is used for splitting data into training and test sets.
changed structure 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`### The diabetes dataset`
changed structure 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`The built-in [diabetes dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) includes 442 samples of data around diabetes, with 10 feature variables, some of which include:`
lesson 1 4 years ago
Fix some bugs (#334) * Update README.zh-cn.md * Update README.zh-cn.md * Update README.zh-cn.md * Update assignment.zh-cn.md * Update README.zh-cn.md * Update assignment.zh-cn.md * Update README.zh-cn.md * Update assignment.zh-cn.md * Update README.ja.md * Update README.ko.md * Update README.md 3 years ago			`- age: age in years`
			`- bmi: body mass index`
			`- bp: average blood pressure`
			`- s1 tc: T-Cells (a type of white blood cells)`
lesson 1 4 years ago
			`✅ This dataset includes the concept of 'sex' as a feature variable important to research around diabetes. Many medical datasets include this type of binary classification. Think a bit about how categorizations such as this might exclude certain parts of a population from treatments.`

incorporating Rishit's review 4 years ago			`Now, load up the X and y data.`

			`> 🎓 Remember, this is supervised learning, and we need a named 'y' target.`

use new tooling video (#330) 3 years ago			In a new code cell, load the diabetes dataset by calling `load_diabetes()`. The input `return_X_y=True` signals that `X` will be a data matrix, and `y` will be the regression target.
changed structure 4 years ago
			`1. Add some print commands to show the shape of the data matrix and its first element:`

			```python
			`X, y = datasets.load_diabetes(return_X_y=True)`
			`print(X.shape)`
			`print(X[0])`
			```

			What you are getting back as a response, is a tuple. What you are doing is to assign the two first values of the tuple to `X` and `y` respectively. Learn more [about tuples](https://wikipedia.org/wiki/Tuple).
lesson 1 4 years ago
changed structure 4 years ago			`You can see that this data has 442 items shaped in arrays of 10 elements:`
edits to regression per feedback 4 years ago
changed structure 4 years ago			```text
			`(442, 10)`
			`[ 0.03807591 0.05068012 0.06169621 0.02187235 -0.0442235 -0.03482076`
			`-0.04340085 -0.00259226 0.01990842 -0.01764613]`
			```
lesson 1 4 years ago
changed structure 4 years ago			`✅ Think a bit about the relationship between the data and the regression target. Linear regression predicts relationships between feature X and target variable y. Can you find the [target](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) for the diabetes dataset in the documentation? What is this dataset demonstrating, given that target?`
clarifications 4 years ago
Reviewing regression tools/data/linear units 2 years ago			2. Next, select a portion of this dataset to plot by selecting the 3rd column of the dataset. You can do this by using the `:` operator to select all rows, and then selecting the 3rd column using the index (2). You can also reshape the data to be a 2D array - as required for plotting - by using `reshape(n_rows, n_columns)`. If one of the parameter is -1, the corresponding dimension is calculated automatically.
lesson 1 4 years ago
changed structure 4 years ago			```python
Reviewing regression tools/data/linear units 2 years ago			`X = X[:, 2]`
			`X = X.reshape((-1,1))`
changed structure 4 years ago			```
lesson 1 4 years ago
changed structure 4 years ago			`✅ At any time, print out the data to check its shape.`
lesson 1 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`3. Now that you have data ready to be plotted, you can see if a machine can help determine a logical split between the numbers in this dataset. To do this, you need to split both the data (X) and the target (y) into test and training sets. Scikit-learn has a straightforward way to do this; you can split your test data at a given point.`
lesson 1 4 years ago
changed structure 4 years ago			```python
			`X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)`
			```
lesson 1 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			4. Now you are ready to train your model! Load up the linear regression model and train it with your X and y training sets using `model.fit()`:
lesson 1 4 years ago
changed structure 4 years ago			```python
			`model = linear_model.LinearRegression()`
			`model.fit(X_train, y_train)`
			```
lessons 4 years ago
changed structure 4 years ago			✅ `model.fit()` is a function you'll see in many ML libraries such as TensorFlow
lessons 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			5. Then, create a prediction using test data, using the function `predict()`. This will be used to draw the line between data groups
lessons 4 years ago
changed structure 4 years ago			```python
			`y_pred = model.predict(X_test)`
			```
lessons 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`6. Now it's time to show the data in a plot. Matplotlib is a very useful tool for this task. Create a scatterplot of all the X and y test data, and use the prediction to draw a line in the most appropriate place, between the model's data groupings.`
lessons 4 years ago
changed structure 4 years ago			```python
			`plt.scatter(X_test, y_test, color='black')`
			`plt.plot(X_test, y_pred, color='blue', linewidth=3)`
Update README.md 3 years ago			`plt.xlabel('Scaled BMIs')`
			`plt.ylabel('Disease Progression')`
			`plt.title('A Graph Plot Showing Diabetes Progression Against BMI')`
changed structure 4 years ago			`plt.show()`
			```
images for 3 lessons 4 years ago
changed structure 4 years ago			`![a scatterplot showing datapoints around diabetes](./images/scatterplot.png)`
images for 3 lessons 4 years ago
changed structure 4 years ago			`✅ Think a bit about what's going on here. A straight line is running through many small dots of data, but what is it doing exactly? Can you see how you should be able to use this line to predict where a new, unseen data point should fit in relationship to the plot's y axis? Try to put into words the practical use of this model.`
lessons 4 years ago
spelling audit, LR lesson 1 audit 4 years ago			`Congratulations, you built your first linear regression model, created a prediction with it, and displayed it in a plot!`
lessons 4 years ago
adding spaces for Challenge callout 4 years ago			`---`
			`## 🚀Challenge`

Update README.md Challenge Hint Updated challenge hint to reflect changes made to how the tutorial reshapes the data matrix. 2 years ago			Plot a different variable from this dataset. Hint: edit this line: `X = X[:,2]`. Given this dataset's target, what are you able to discover about the progression of diabetes as a disease?
added links to the new quiz apps 2 years ago			`## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/10/)`
lessons 4 years ago
			`## Review & Self Study`

edits to regression per feedback 4 years ago			`In this tutorial, you worked with simple linear regression, rather than univariate or multiple linear regression. Read a little about the differences between these methods, or take a look at [this video](https://www.coursera.org/lecture/quantifying-relationships-regression-models/linear-vs-nonlinear-categorical-variables-ai2Ef)`

Update 2 years ago			`Read more about the concept of regression and think about what kinds of questions can be answered by this technique. Take this [tutorial](https://docs.microsoft.com/learn/modules/train-evaluate-regression-models?WT.mc_id=academic-77952-leestott) to deepen your understanding.`
instrumented links, final reading 4 years ago
use new tooling video (#330) 3 years ago			`## Assignment`
Assignment callout made more clear 4 years ago
			`[A different dataset](assignment.md)`