The process of building, using, and maintaining machine learning models and the data they use is a process very different from many other development workflows. For web developers, techniques of machine learning can initially seem very strange. In this lesson, we will demystify the process by outlining it. You will:
- Understand the processes underpinning machine learning at a high level.
- Explore base concepts such as 'models', 'predictions', and 'training data'.
On a high level, the craft of creating machine learning (ML) processes is comprised of a number of steps:
1. **Decide on the question**. Most ML processes start by asking a question that cannot be answered by a simple conditional program or rules-based engine. These questions often revolve around predictions based on a collection of data.
2. **Collect and prepare data**. To be able to answer your question, you need data. The quality and, sometimes, quantity of your data will determine how well you can answer your initial question. Visualizing data is an important aspect of this phase. This phase also includes splitting the data into a training and testing group to build a model.
3. **Choose a training method**. Depending on your question and the nature of your data, you need to choose how you want to train a model to best reflect your data and make accurate predictions against it. This is the part of your ML process that requires specific expertise and, often, a considerably amount of experimentation.
4. **Train the model**. Using your training data, you use various algorithms to train a model to recognize patterns in the data. The model might leverage internal weights that can be adjusted to privilege certain parts of the data over others to build a better model.
5. **Evaluate the model**. You use never before seen data (your testing data) from your collected set to see how the model is performing.
6. **Parameter tuning**. Based on the performance of your model, you can redo the process using different parameters, or variables, that control the behavior of the algorithms used to train the model.
7. **Predict**. Use new input to test the accuracy of your model.
## What question to ask
Computers are particularly skilled at discovering hidden patterns in data. This utility is very helpful for researchers who have questions about a given domain that cannot be easily answered by creating a conditionally-based rules engine. Given an actuarial task, for example, a data scientist might be able to construct handcrafted rules around the mortality of smokers vs non-smokers.
When many other variables are brought into the equation, however, a ML model might prove more efficient to predict future mortality rates based on past health history. A more cheerful example might be making weather predictions for the month of April in a given location based on data that includes latitude, longitude, climate change, proximity to the ocean, patterns of the jet stream, and more.
✅ This [slide deck](https://www2.cisl.ucar.edu/sites/default/files/0900%20June%2024%20Haupt_0.pdf) on weather models offers a historical perspective for using ML in weather analysis
## Pre-Building Tasks
Before starting to build your model, there are several tasks you need to complete. To test your question and form a hypothesis based on a model's predictions, you need to identify and configure several elements.
### Data
To be able to answer your question with any kind of certainty, you need a good amount of data of the right type. There are two things you need to do at this point:
- **Collect data**. Keeping in mind the previous lesson on fairness in data analysis, collect your data with care. Be aware of the sources of this data, any inherent biases it might have, and document its origin.
- **Prepare data**. There are several steps in the data preparation process. You might need to collate data and normalize it if it comes from diverse sources. You can improve the data's quality and quantity through various methods such as converting strings to numbers (as we do in [Clustering](../../5-Clustering/1-Visualize/README.md)). You might also generate new data, based on the original (as we do in [Classification](../../4-Classification/1-Introduction/README.md)). You can clean and edit the data (as we did prior to the [Web App](../3-Web-App/README.md) lesson). Finally you might also need to randomize it and shuffle it, depending on your training techniques.
✅ After collecting and processing your data, take a moment to see if its shape will allow you to address your intended question. It may be that the data will not perform well in your given task, as we discover in our [Clustering](../../5-Clustering/1-Visualize/README.md) lessons!
### Selecting your feature variable
A [feature](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection) is a measurable property of your data. In many datasets it is expressed as a column heading like 'date' 'size' or 'color'. Your feature variable, usually represented as `y` in code, represents the answer to the question you are trying to ask of your data: in December, what **color** pumpkins will be cheapest? in San Francisco, what neighborhoods will have the best real estate **price**?
🎓 **Feature Selection and Feature Extraction** How do you know which variable to choose when building a model? You'll probably go through a process of feature selection or feature extraction to choose the right variables for the most performant model. They're not the same thing, however: "Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features." [source](https://wikipedia.org/wiki/Feature_selection)
### Visualize your data
An important aspect of the data scientist's toolkit is the power to visualize data using several excellent libraries such as Seaborn or MatPlotLib. Representing your data visually might allow you to uncover hidden correlations that you can leverage. Your visualizations might also help you to uncover bias or unbalanced data (as we discover in [Classification](../../4-Classification/2-Classifiers-1/README.md)).
### Split your dataset
Prior to training, you need to split your dataset into two or more parts of unequal size, but representing the data well.
- **Training**, this part of the dataset goes into your model to train it. The size of this chunk constitutes the majority of the original dataset.
- **Testing**. A test dataset is another independent group of data, often gathered from the original data, that you use to confirm the performance of the built model.
- **Validating**. A validation set is a smaller independent group of examples that you use to tune the model's hyperparameters, or architecture, to improve the model. Depending on your data's size and the question you are asking, you might not need to build this third set (as we noted in [Time Series Forecasting](../7-TimeSeries/1-Introduction/README.md)).
## Building a model
Using your training data, your goal is to build a model, or a statistical representation of your data, using various algorithms to **train** it. Training a model exposes it to data and allows it to make assumptions about perceived patterns it discovers, validates, and accepts or rejects.
### Decide on a training method
Depending on your question and the nature of your data, your will choose a method to train it. Stepping through [Scikit-Learn's documentation](https://scikit-learn.org/stable/user_guide.html) - which we use in this course - , you can explore many ways to train a model. Depending on your experience, you might have to try several different methods to build the best model. You are likely to go through a process whereby data scientists evaluate the performance of a model by feeding it unseen data, checking for accuracy, bias, and other quality-degrading issues, selecting the most appropriate training method for the task at hand.
### Train
Armed with your training data, you are ready to 'fit' it to create a model. You will notice that in many ML libraries you will find the code 'model.fit' - it is at this time that you send in your data as an array of values (usually 'X') and a feature variable (usually 'y').
### Evaluate the model
Once the training process is complete (it can take many iterations, or 'epochs', to train a large model), you will be able to evaluate the model's quality by using test data to gauge its performance. This data is a subset of the original data that the model has not previously analyzed. You can print out a table of metrics about your model's quality.
🎓 Model Fitting
In the context of machine learning, Model fitting refers to the accuracy of the model's underlying function as it attempts to analyze data with which it is not familiar.
🎓 **Underfitting** and **overfitting** are common problems that degrade the quality of the model as the model fits either not well enough or too well. This causes the model to make predictions either too closely aligned or too loosely aligned with its training data. An overfit model predicts training data too well because it has learned the data's details and noise too well. An underfit model is not accurate as it can neither accurately analyze its training data nor data it has not yet 'seen'.
## Parameter tuning
Once your initial training is complete, observe the quality of the model and consider improving it by tweaking its 'hyperparameters'. Read more about the process [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters?WT.mc_id=academic-15963-cxa)
##Prediction
This is the moment where you can use completely new data to test your model's accuracy. In an 'applied' ML setting, where you are building web assets to use the model in production, this process might involve gathering user input (a button press, for example) to set a variable and send it to the model for inference, or evaluation.
In these lessons, you will discover how to use these steps to prepare, build, test, evaluate, and predict - all the gestures of a data scientist and more, as you progress in your journey to become a 'full stack' ML engineer.
---
## 🚀Challenge
Draw a flow chart reflecting the steps of a ML practitioner. Where do you see yourself right now in the process? Where do you predict you will find difficulty? What seems easy to you?
In your company, in a user group, or among your friends or fellow students, talk to someone who works professionally as a data scientist. Write a short paper (500 words) about their daily occupations. Are they specialists, or do they work 'full stack'?
| | An essay of the correct length, with attributed sources, is presented as a .doc file | The essay is poorly attributed or shorter than the required length | No essay is presented |
In this section of the curriculum, you will be introduced to the base concepts underlying the field of machine learning, what is is, and learn about its history.
In this section of the curriculum, you will be introduced to the base concepts underlying the field of machine learning, what it is, and learn about its history and the techniques researchers use to work with it.
### Lessons
1. [Introduction to Machine Learning](1-intro-to-ML/README.md)
1. [The History of Machine Learning and AI](2-history-of-ML/README.md)
1. [Fairness and Machine Learning](3-fairness/README.md)
1. [Techniques of Machine Learning](4-techniques-of-ML/README.md)
### Credits
@ -14,4 +14,6 @@ In this section of the curriculum, you will be introduced to the base concepts u
"The History of Machine Learning" was written with ♥️ by [Jen Looper](https://twitter.com/jenlooper) and [Amy Boyd](https://twitter.com/AmyKateNicho)
"Fairness and Machine Learning" was written with ♥️ by [Tomomi Imura](https://twitter.com/girliemac)
"Fairness and Machine Learning" was written with ♥️ by [Tomomi Imura](https://twitter.com/girliemac)
"Techniques of Machine Learning" was written with ♥️ by [Jen Looper](https://twitter.com/jenlooper) and [Chris Noring](https://twitter.com/softchris)
The lessons in this section cover types of Regression in the context of machine learning. Regression models can help determine the relationship between variables. This type of model can predict values such as length, temperature, or age, thus uncovering relationships between variables as it analyzes datapoints.
In this series of lessons, you'll discover the difference between Linear vs. Logistic Regression, and when you should use one or the other.
In these four lessons, you will discover how to build Regression models. We will discuss what these are for shortly. But before you do anything, make sure you have the right tools in place to start the process!
But before you do anything, make sure you have the right tools in place!
In this lesson, you will learn how to:
- Configure your computer for local machine learning tasks.
- Work with Jupyter notebooks.
- Use Scikit-Learn, including installation.
- Explore Linear Regression with a hands-on exercise.
In this lesson, you will learn:
- How to configure your computer for local machine learning tasks
- Getting used to working with Jupyter notebooks
- An introduction to Scikit-Learn, including installation
- An introduction to Linear Regression with a hands-on exercise
## Installations and Configurations
[![Using Python with Visual Studio Code](https://img.youtube.com/vi/7EXd4_ttIuw/0.jpg)](https://youtu.be/7EXd4_ttIuw "Using Python with Visual Studio Code")
> 🎥 Click the image above for a video: using Python within VS Code.
1. Ensure that [Python](https://www.python.org/downloads/) is installed on your computer. You will use Python for many data science and machine learning tasks. Most computer systems already include a Python installation. There are useful [Python Coding Packs](https://code.visualstudio.com/learn/educators/installers?WT.mc_id=academic-15963-cxa) available as well to ease the setup for some users. Some usages of Python, however, require one version of the software, whereas others require a different version. For this reason, it's useful to work within a virtual environment.
1. **Install Python**. Ensure that [Python](https://www.python.org/downloads/) is installed on your computer. You will use Python for many data science and machine learning tasks. Most computer systems already include a Python installation. There are useful [Python Coding Packs](https://code.visualstudio.com/learn/educators/installers?WT.mc_id=academic-15963-cxa) available as well, to ease the setup for some users.
Some usages of Python, however, require one version of the software, whereas others require a different version. For this reason, it's useful to work within a [virtual environment](https://docs.python.org/3/library/venv.html).
2. **Install Visual Studio Code**. Make sure you have Visual Studio Code installed on your computer. Follow these instructions to [install Visual Studio Code](https://code.visualstudio.com/) for the basic installation. You are going to use Python in Visual Studio Code in this course, so you might want to brush up on how to [configure Visual Studio Code](https://docs.microsoft.com/learn/modules/python-install-vscode?WT.mc_id=academic-15963-cxa) for Python development.
2. Make sure you have Visual Studio Code installed on your computer. Follow [these instructions](https://code.visualstudio.com/) for the basic installation. You are going to use Python in Visual Studio Code in this course, so you might want to brush up on how to [configure](https://docs.microsoft.com/learn/modules/python-install-vscode?WT.mc_id=academic-15963-cxa) VS Code for Python development.
> Get comfortable with Python by working through this collection of [Learn modules](https://docs.microsoft.com/users/jenlooper-2911/collections/mp1pagggd5qrq7?WT.mc_id=academic-15963-cxa)
> Get comfortable with Python by working through this collection of [Learn modules](https://docs.microsoft.com/users/jenlooper-2911/collections/mp1pagggd5qrq7?WT.mc_id=academic-15963-cxa)
3. **Install Scikit-Learn**, by following [these instructions](https://scikit-learn.org/stable/install.html). Since you need to ensure that you use Python 3, it's recommended that you use a virtual environment. Note, if you are installing this library on a M1 Mac, there are special instructions on the page linked above.
1. **Install Jupyter Notebook**. You will need to [install the Jupyter package](https://pypi.org/project/jupyter/).
3. Install Scikit-Learn by following [these instructions](https://scikit-learn.org/stable/install.html). Since you need to ensure that you use Python 3, it's recommended that you use a virtual environment. Note, if you are installing this library on a M1 Mac, there are special instructions on the page linked above.
## Your ML Authoring Environment
You are going to use **notebooks** to develop your Python code and create machine learning models. This type of file is a common tool for data scientists, and they can be identified by their suffix or extension `.ipynb`.
Notebooks are an interactive environment that allow the developer to both code and add notes and write documentation around the code which is quite helpful for experimental or research-oriented projects.
### Working with A Notebook
In this folder, you will find the file `notebook.ipynb`. If you open it in VS Code, assuming VS Code is properly configured, a Jupyter server will start with Python 3+ started. You will find areas of the notebook that can be 'run' by pressing arrows next to code blocks, and other areas that contain text.
In your notebook, add a comment. To do this, click the 'md' icon and add a bit of markdown, like `# Welcome to your notebook`.
Next, add some Python code: Type `print('hello notebook')` and click the arrow to run the code. You should see the printed statement, 'hello notebook'.
### Exercise - work with A Notebook
![VS Code with a notebook open](images/notebook.png)
In this folder, you will find the file _notebook.ipynb_.
You can interleaf your code with comments to self-document the notebook.
1. Open _notebook.ipynb_ in Visual Studio Code.
✅ Think for a minute how different a web developer's working environment is versus that of a data scientist.
## Up and Running with Scikit-Learn
A Jupyter server will start with Python 3+ started. You will find areas of the notebook that can be `run`, pieces of code. You can run a code block, by selecting the icon that looks like a play button.
Now that Python is set up in your local environment and you are comfortable with Jupyter notebooks, let's get equally comfortable with Scikit-Learn (pronounce it `sci` as in `science`). Scikit-Learn provides an [extensive API](https://scikit-learn.org/stable/modules/classes.html#api-ref) to help you perform ML tasks.
1. Select the `md` icon and add a bit of markdown, and the following text **# Welcome to your notebook**.
According to their [website](https://scikit-learn.org/stable/getting_started.html), "Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities."
### Let's unpack some of this jargon:
Next, add some Python code.
> 🎓 A machine learning **model** is a mathematical model that generates predictions given data to which it has not been exposed. It builds these predictions based on its analysis of data and extrapolating patterns.
1. Type **print("hello notebook'")** in the code block.
1. Select the arrow to run the code.
> 🎓 **[Supervised Learning](https://wikipedia.org/wiki/Supervised_learning)** works by mapping an input to an output based on example pairs. It uses **labeled** training data to build a function to make predictions. [Download a printable Zine about Supervised Learning](https://zines.jenlooper.com/zines/supervisedlearning.html). Regression, which is covered in this group of lessons, is a type of supervised learning.
You should see the printed statement:
> 🎓 **[Unsupervised Learning](https://wikipedia.org/wiki/Unsupervised_learning)** works similarly but it maps pairs using **unlabeled data**. [Download a printable Zine about Unsupervised Learning](https://zines.jenlooper.com/zines/unsupervisedlearning.html)
```output
hello notebook
```
> 🎓 **[Model Fitting](https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html#sphx-glr-auto-examples-model-selection-plot-underfitting-overfitting-py)** in the context of machine learning refers to the accuracy of the model's underlying function as it attempts to analyze data with which it is not familiar. **Underfitting** and **overfitting** are common problems that degrade the quality of the model as the model fits either not well enough or too well. This causes the model to make predictions either too closely aligned or too loosely aligned with its training data. An overfit model predicts training data too well because it has learned the data's details and noise too well. An underfit model is not accurate as it can neither accurately analyze its training data nor data it has not yet 'seen'.
![VS Code with a notebook open](images/notebook.png)
![overfitting vs. correct model](images/overfitting.png)
You can interleaf your code with comments to self-document the notebook.
> Infographic by [Jen Looper](https://twitter.com/jenlooper)
✅ Think for a minute how different a web developer's working environment is versus that of a data scientist.
> 🎓 **Data Preprocessing** is the process whereby data scientists clean and convert data for use in the machine learning lifecycle.
## up and Running with Scikit-Learn
> 🎓 **Model Selection and Evaluation** is the process whereby data scientists evaluate the performance of a model or any other relevant metric of a model by feeding it unseen data, selecting the most appropriate model for the task at hand.
Now that Python is set up in your local environment, and you are comfortable with Jupyter notebooks, let's get equally comfortable with Scikit-Learn (pronounce it `sci` as in `science`). Scikit-Learn provides an [extensive API](https://scikit-learn.org/stable/modules/classes.html#api-ref) to help you perform ML tasks.
> 🎓 **Feature Variable** A [feature](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection) is a measurable property of your data. In many datasets it is expressed as a column heading like 'date' 'size' or 'color'.
According to their [website](https://scikit-learn.org/stable/getting_started.html), "Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities."
> 🎓 **[Training and Testing](https://wikipedia.org/wiki/Training,_validation,_and_test_sets) datasets** Throughout this curriculum, you will divide up a dataset into at least two parts, one large group of data for 'training' and a smaller part for 'testing'. Sometimes you'll also find a 'validation' set. A training set is the group of examples you use to train a model. A validation set is a smaller independent group of examples that you use to tune the model's hyperparameters, or architecture, to improve the model. A test dataset is another independent group of data, often gathered from the original data, that you use to confirm the performance of the built model.
> 🎓 **Feature Selection and Feature Extraction** How do you know which variable to choose when building a model? You'll probably go through a process of feature selection or feature extraction to choose the right variables for the most performant model. They're not the same thing, however: "Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features." [source](https://wikipedia.org/wiki/Feature_selection)
In this course, you will use Scikit-Learn and other tools to build machine learning models to perform what we call 'traditional machine learning' tasks. We have deliberately avoided neural networks and deep learning, as they are better covered in our forthcoming 'AI for Beginners' curriculum.
Scikit-Learn makes it straightforward to build models and evaluate them for use. It is primarily focused on using numeric data and contains several ready-made datasets for use as learning tools. It also includes pre-built models for students to try. Let's explore the process of loading prepackaged data and using a built in estimator first ML model with Scikit-Learn with some basic data.
## Your First Scikit-Learn Notebook
## Exercise - your First Scikit-Learn Notebook
> This tutorial was inspired by the [Linear Regression example](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py) on Scikit-Learn's web site.
In the `notebook.ipynb` file associated to this lesson, clear out all the cells by pressing the 'trash can' icon.
In the _notebook.ipynb_ file associated to this lesson, clear out all the cells by pressing the 'trash can' icon.
In this section, you will work with a small dataset about diabetes that is built into Scikit-Learn for learning purposes. Imagine that you wanted to test a treatment for diabetic patients. Machine Learning models might help you determine which patients would respond better to the treatment, based on combinations of variables. Even a very basic Regression model, when visualized, might show information about variables that would help you organize your theoretical clinical trials.
In this section, you will work with a small dataset about diabetes that is built into Scikit-Learn for learning purposes. Imagine that you wanted to test a treatment for diabetic patients. Machine Learning models might help you determine which patients would respond better to the treatment, based on combinations of variables. Even a very basic Regression model, when visualized, might show information about variables that would help you organize your theoretical clinical trials.
> ✅ There are many types of Regression methods, and which one you pick depends on the answer you're looking for. If you want to predict the probable height for a person of a given age, you'd use Linear Regression, as you're seeking a **numeric value**. If you're interested in discovering whether a type of cuisine should be considered vegan or not, you're looking for a **category assignment** so you would use Logistic Regression. You'll learn more about Logistic Regression later. Think a bit about some questions you can ask of data, and which of these methods would be more appropriate.
✅ There are many types of Regression methods, and which one you pick depends on the answer you're looking for. If you want to predict the probable height for a person of a given age, you'd use Linear Regression, as you're seeking a **numeric value**. If you're interested in discovering whether a type of cuisine should be considered vegan or not, you're looking for a **category assignment** so you would use Logistic Regression. You'll learn more about Logistic Regression later. Think a bit about some questions you can ask of data, and which of these methods would be more appropriate.
Let's get started on this task.
1. Import some libraries to help with your tasks. First, import `matplotlib`, a useful [graphing tool](https://matplotlib.org/). We will use it to create a line plot. Also import [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html), a useful library for handling numeric data in Python. Load up `datasets` and the `linear_model` from the Scikit-Learn library. Load `model_selection` for splitting data into training and test sets.
### Import libraries
For this task we will import some libraries:
- **matplotlib**. It's a useful [graphing tool](https://matplotlib.org/) and we will use it to create a line plot.
- **numpy**. [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html) is a useful library for handling numeric data in Python.
- **sklearn**. This is the Scikit-Learn library.
Import some libraries to help with your tasks.
```python
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, model_selection
```
1. Add imports by typing the following code:
2. Print out a bit of the built-in [diabetes housing dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset). It includes 442 samples of data around diabetes, with 10 feature variables, some of which include:
```python
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model, model_selection
```
Above you are importing `matplottlib`, `numpy` and you are importing `datasets`, `linear_model` and `model_selection` from `sklearn`. `model_selection` is used for splitting data into training and test sets.
### The diabetes housing dataset
The built-in [diabetes housing dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) includes 442 samples of data around diabetes, with 10 feature variables, some of which include:
age: age in years
bmi: body mass index
@ -114,72 +127,76 @@ Now, load up the X and y data.
> 🎓 Remember, this is supervised learning, and we need a named 'y' target.
3. In a new cell, load the diabetes dataset as data and target (X and y, loaded as a tuple). X will be a data matrix, and y will be the regression target. Add some print commands to show the shape of the data matrix and its first element:
In a new code cell, load the diabetes dataset by calling `load_diabetes()`. The input `return_X_y=True` signals that `X` will be a data matrix, and `y` will be the regression target.
1. Add some print commands to show the shape of the data matrix and its first element:
```python
X, y = datasets.load_diabetes(return_X_y=True)
print(X.shape)
print(X[0])
```
What you are getting back as a response, is a tuple. What you are doing is to assign the two first values of the tuple to `X` and `y` respectively. Learn more [about tuples](https://wikipedia.org/wiki/Tuple).
> 🎓 A **tuple** is an [ordered list of elements](https://wikipedia.org/wiki/Tuple).
You can see that this data has 442 items shaped in arrays of 10 elements:
✅ Think a bit about the relationship between the data and the regression target. Linear regression predicts relationships between feature X and target variable y. Can you find the [target](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) for the diabetes dataset in the documentation? What is this dataset demonstrating, given that target?
✅ Think a bit about the relationship between the data and the regression target. Linear regression predicts relationships between feature X and target variable y. Can you find the [target](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) for the diabetes dataset in the documentation? What is this dataset demonstrating, given that target?
You can see that this data has 442 items shaped in arrays of 10 elements:
1. Next, select a portion of this dataset to plot by arranging it into a new array using numpy's `newaxis` function. We are going to use Linear Regression to generate a line between values in this data, according to a pattern it determines.
4. Next, select a portion of this dataset to plot by arranging it into a new array using numpy's newaxis function. We are going to use Linear Regression to generate a line between values in this data, according to a pattern it determines.
✅ At any time, print out the data to check its shape.
```python
X = X[:, np.newaxis, 2]
```
✅ At any time, print out the data to check its shape
1. Now that you have data ready to be plotted, you can see if a machine can help determine a logical split between the numbers in this dataset. To do this, you need to split both the data (X) and the target (y) into test and training sets. Scikit-Learn has a straightforward way to do this; you can split your test data at a given point.
5. Now that you have data ready to be plotted, you can see if a machine can help determine a logical split between the numbers in this dataset. To do this, you need to split both the data (X) and the target (y) into test and training sets. Scikit-Learn has a straightforward way to do this; you can split your test data at a given point.
```python
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
```
```python
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
```
6. Now you are ready to train your model! Load up the Linear Regression model and train it with your X and y training sets:
1. Now you are ready to train your model! Load up the Linear Regression model and train it with your X and y training sets using `model.fit()`:
✅ `model.fit` is a command you'll see in many ML libraries such as TensorFlow
```python
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
```
```python
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
```
✅ `model.fit()` is a function you'll see in many ML libraries such as TensorFlow
7. Then, create a prediction using test data. This will be used to draw the line between data groups
1. Then, create a prediction using test data, using the function `predict()`. This will be used to draw the line between data groups
```python
y_pred = model.predict(X_test)
```
```python
y_pred = model.predict(X_test)
```
8. Now it's time to show the data in a plot. Matplotlib is a very useful tool for this task. Create a scatterplot of all the X and y test data, and use the prediction to draw a line in the most appropriate place, between the model's data groupings.
![a scatterplot showing datapoints around diabetes](./images/scatterplot.png)
![a scatterplot showing datapoints around diabetes](./images/scatterplot.png)
✅ Think a bit about what's going on here. A straight line is running through many small dots of data, but what is it doing exactly? Can you see how you should be able to use this line to predict where a new, unseen data point should fit in relationship to the plot's y axis? Try to put into words the practical use of this model.
✅ Think a bit about what's going on here. A straight line is running through many small dots of data, but what is it doing exactly? Can you see how you should be able to use this line to predict where a new, unseen data point should fit in relationship to the plot's y axis? Try to put into words the practical use of this model.
Congratulations, you just built your first Linear Regression model, created a prediction with it, and displayed it in a plot!
Congratulations, you built your first Linear Regression model, created a prediction with it, and displayed it in a plot!
---
## 🚀Challenge
Plot a different variable from this dataset. Hint: edit this line: `X = X[:, np.newaxis, 2]`. Given this dataset's target, what are you able to discover about the progression of diabetes as a disease?
So far you have explored what regression is with sample data gathered from the pumpkin pricing dataset that we will use throughout this unit. You have also visualized it using Matplotlib. Now you are ready to dive deeper into regression for ML. In this lesson, you will learn more about two types of regression: basic linear regression and polynomial regression, along with some of the math underlying these techniques.
@ -249,7 +249,7 @@ It does make sense! And, if this is a better model than the previous one, lookin
Test several different variables in this notebook to see how correlation corresponds to model accuracy.
@ -254,7 +254,7 @@ In future lessons on classifications, you will learn how to iterate to improve y
## 🚀Challenge
There's a lot more to unpack regarding Logistic Regression! But the best way to learn is to experiment. Find a dataset that lends itself to this type of analysis and build a model with it. What do you learn? tip: try [Kaggle](https://kaggle.com) for interesting datasets.
@ -8,7 +8,11 @@ In North America, pumpkins are often carved into scary faces for Halloween. Let'
## What you will learn
In this section, you will get set up to begin machine learning tasks, including configuring Visual Studio code to manage notebooks, the common environment for data scientists. You will discover Scikit-Learn, a library for machine learning, and you will build your first models, focusing on Regression models in this chapter.
The lessons in this section cover types of Regression in the context of machine learning. Regression models can help determine the _relationship_ between variables. This type of model can predict values such as length, temperature, or age, thus uncovering relationships between variables as it analyzes data points.
In this series of lessons, you'll discover the difference between Linear vs. Logistic Regression, and when you should use one or the other.
In this group of lessons, you will get set up to begin machine learning tasks, including configuring Visual Studio code to manage notebooks, the common environment for data scientists. You will discover Scikit-Learn, a library for machine learning, and you will build your first models, focusing on Regression models in this chapter.
> There are useful low-code tools that can help you learn about working with Regression models. Try [Azure ML for this task](https://docs.microsoft.com/learn/modules/create-regression-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
In this lesson, you will train a Linear Regression model and a Classification model on a dataset that's out of this world: UFO Sightings over the past century, sourced from [NUFORC's database](https://www.nuforc.org). We will continue our use of notebooks to clean data and train our model, but you can take the process one step further by exploring using a model 'in the wild', so to speak: in a web app. To do this, you need to build a web app using Flask.
There are several ways to build web apps to consume machine learning models. Your web architecture may influence the way your model is trained. Imagine that you are working in a business where the data science group has trained a model that they want you to use in an app. There are many questions you need to ask: Is it a web app, or a mobile app? Where will the model reside, in the cloud or locally? Does the app have to work offline? And what technology was used to train the model, because that may influence the tooling you need to use?
@ -267,7 +267,7 @@ Using a model this way, with Flask and a pickled model, is relatively straightfo
Instead of working in a notebook and importing the model to the Flask app, you could train the model right within the Flask app! Try converting your Python code in the notebook, perhaps after your data is cleaned, to train the model from within the app on a route called `train`. What are the pros and cons of pursuing this method?
@ -11,7 +11,7 @@ Classification is a form of [supervised learning](https://wikipedia.org/wiki/Sup
Remember, Linear Regression helped you predict relationships between variables and make accurate predictions on where a new datapoint would fall in relationship to that line. So, you could predict what price a pumpkin would be in September vs. December, for example. Logistic Regression helped you discover binary categories: at this price point, is this pumpkin orange or not-orange?
Classification uses various algorithms to determine other ways of determining a data point's label or class. Let's work with this cuisine data to see whether, by observing a group of ingredients, we can determine its cuisine of origin.
@ -210,7 +210,7 @@ This fresh CSV can now be found in the root data folder.
This curriculum contains several interesting datasets. Dig through the `data` folders and see if any contain datasets that would be appropriate for binary or multi-class classification? What questions would you ask of this dataset?
In this lesson, you will use the dataset you saved from the last lesson full of balanced, clean data all about cuisines. You will use this dataset with a variety of classifiers to predict a given national cuisine based on a group of ingredients. While doing so, you'll learn more about some of the ways that algorithms can be leveraged for classification tasks.
In this lesson, you used your cleaned data to build a machine learning model that can predict a national cuisine based on a series of ingredients. Take some time to read through the many options Scikit-Learn provides to classify data. Dig deeper into the concept of 'solver' to understand what goes on behind the scenes.
Dig a little more into the math behind Logistic Regression in [this lesson](https://people.eecs.berkeley.edu/~russell/classes/cs194/f11/lectures/CS194%20Fall%202011%20Lecture%2006.pdf)
@ -189,7 +189,7 @@ This method of Machine Learning "combines the predictions of several base estima
Each of these techniques has a large number of parameters that you can tweak. Research each one's default parameters and think about what tweaking these parameters would mean for the model's quality.
There's a lot of jargon in these lessons, so take a minute to review [this list](https://docs.microsoft.com/en-us/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-15963-cxa) of useful terminology!
@ -7,7 +7,7 @@ One of the most useful practical uses of machine learning is building recommenda
[![Recommendation Systems Introduction](https://img.youtube.com/vi/giIXNoiqO_U/0.jpg)](https://youtu.be/giIXNoiqO_U "Recommendation Systems Introduction")
> 🎥 Click the image above for a video: Andrew Ng introduces recommendation system design
- How to build a model and save it as an Onnx model
@ -285,7 +285,7 @@ Congratulations, you have created a simple web app recommendation with a few fie
Your web app is very minimal, so continue to build it out using ingredients and their indexes from the [ingredient_indexes](../data/ingredient_indexes.csv) data. What flavor combinations work to create a given national dish?
While this lesson just touched on the utility of creating a recommendation system for food ingredients, this area of ML applications is very rich in examples. Read some more about how these systems are built:
@ -5,7 +5,7 @@ Clustering is a type of [Unsupervised Learning](https://wikipedia.org/wiki/Unsup
[![No One Like You by PSquare](https://img.youtube.com/vi/ty2advRiWJM/0.jpg)](https://youtu.be/ty2advRiWJM "No One Like You by PSquare")
> 🎥 Click the image above for a video. While you're studying Machine Learning with Clustering, enjoy some Nigerian Dance Hall tracks - this is a highly rated song from 2014 by PSquare.
[Clustering](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124) is very useful for data exploration. Let's see if it can help discover trends and patterns in the way Nigerian audiences consume music.
@ -307,7 +307,7 @@ In general, for clustering, you can use scatterplots to show clusters of data, s
## 🚀Challenge
In preparation for the next lesson, make a chart about the various clustering algorithms you might discover and use in a production environment. What kinds of problems is the clustering trying to address?
In this lesson, you will learn how to create clusters using Scikit-Learn and the Nigerian music dataset you imported earlier. We will cover the basics of K-Means for Clustering. Keep in mind that, as you learned in the earlier lesson, there are many ways to work with clusters and the method you use depends on your data. We will try K-Means as it's the most common Clustering technique. Let's get started!
@ -212,13 +212,12 @@ Spend some time with this notebook, tweaking parameters. Can you improve the acc
Hint: Try to scale your data. There's commented code in the notebook that adds Standard Scaling to make the data columns resemble each other more closely in terms of range. You'll find that while the silhouette score goes down, the 'kink' in the elbow graph smooths out. This is because leaving the data unscaled allows data with less variance to carry more weight. Read a bit more on this problem [here](https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering/21226#21226).
Take a look at Stanford's K-Means Simulator [here](https://stanford.edu/class/engr108/visualizations/kmeans/kmeans.html). You can use this tool to visualize sample data points and determine its centroids. With fresh data, click 'update' to see how long it takes to find convergence. You can edit the data's randomness, numbers of clusters and numbers of centroids. Does this help you get an idea of how the data can be grouped?
Also, take a look at [this handout on k-means](https://stanford.edu/~cpiech/cs221/handouts/kmeans.html
) from Stanford
Also, take a look at [this handout on k-means](https://stanford.edu/~cpiech/cs221/handouts/kmeans.html) from Stanford
# Common Natural Language Processing Tasks and Techniques
For most *Natural Language Processing* tasks, the text to be processed must be broken down, examined, and the results stored or cross referenced with rules and data sets. This allows the programmer to derive the meaning or intent or only the frequency of terms and words in a text.
Let's discover common techniques used in processing text. Combined with machine learning, these techniques help you to analyse large amounts of text efficiently. Before applying ML to these tasks, however, let's understand the problems encountered by an NLP specialist.
@ -179,7 +179,7 @@ One possible solution to the task is [here](solution/bot.py)
Take a task in the prior knowledge check and try to implement it. Test the bot on a friend. Can it trick them? Can you make your bot more 'believable?'
In the previous lessons you learned how to build a basic bot using TextBlob, a library that embeds ML behind-the-scenes to perform basic NLP tasks such as noun phrase extraction. Another important challenge in computational linguistics is accurate *translation* of a sentence from one spoken or written language to another.
This is a very hard problem compounded by the fact that there are thousands of languages and each can have very different grammar rules. One approach is to convert the formal grammar rules for one language, such as English, into a non-language dependent structure, and then translate it by converting back to another language. This means that you would take the following steps:
@ -138,7 +138,7 @@ Here is a sample [solution](solutions/book.py).
Can you make Marvin even better by extracting other features from the user input?
@ -354,7 +354,7 @@ A very nice plot, showing a model with good accuracy. Well done!
Dig into the ways to test the accuracy of a Time Series Model. We touch on MAPE in this lesson, but are there other methods you could use? Research them and annotate them. A helpful document can be found [here](https://otexts.com/fpp2/accuracy.html)
This lesson touches on only the basics of Time Series Forecasting with ARIMA. Take some time to deepen your knowledge by digging into [this repository](https://microsoft.github.io/forecasting/) and its various model types to learn other ways to build Time Series models.
In this lesson, we will explore the world of **[Peter and the Wolf](https://en.wikipedia.org/wiki/Peter_and_the_Wolf)**, inspired by a musical fairy tale by a Russian composer, [Sergei Prokofiev](https://en.wikipedia.org/wiki/Sergei_Prokofiev). We will use **Reinforcement Learning** to let Peter explore his environment, collect tasty apples and avoid meeting the wolf.
@ -287,6 +287,7 @@ What we also observe on this graph, is that at some point the length increased a
Overall, it is important to remember that the success and quality of the learning process significantly depends on parameters, such as leaning rate, learning rate decay and discount factor. Those are often called **hyperparameters**, to distinguish them from **parameters** which we optimize during training (eg. Q-Table coefficients). The process of finding best hyperparameter values is called **hyperparameter optimization**, and it deserves a separate topic.
@ -16,7 +16,7 @@ Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson cur
Travel with us around the world as we apply these classic techniques to data from many areas of the world. Each lesson includes pre- and post-lesson quizzes, written instructions to complete the lesson, a solution, an assignment and more. Our project-based pedagogy allows you to learn while building, a proven way for new skills to 'stick'.
**✍️ Hearty thanks to our authors** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshkinov, Ornella Altunyan, Amy Boyd
**✍️ Hearty thanks to our authors** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshkinov, Chris Noring, Ornella Altunyan, and Amy Boyd
**🎨 Thanks as well to our illustrators** Tomomi Imura, Dasani Madipalli, and Jen Looper
@ -69,32 +69,32 @@ By ensuring that the content aligns with projects, the process is made more enga
> **A note about quizzes**: All quizzes are contained [in this app](https://jolly-sea-0a877260f.azurestaticapps.net), for 48 total quizzes of three questions each. They are linked from within the lessons but the quiz app can be run locally; follow the instruction in the `quiz-app` folder.
| 01 | [Introduction](1-Introduction/README.md) | Introduction to Machine Learning | Learn the basic concepts behind Machine Learning | [lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
| 02 | [Introduction](1-Introduction/README.md) | The History of Machine Learning | Learn the history underlying this field | [lesson](Introduction/2-history-of-ML/README.md) | Jen and Amy |
| 03 | [Introduction](1-Introduction/README.md) | Fairness and Machine Learning | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [lesson](1-Introduction/3-fairness/README.md) | Tomomi |
| 04 | Introduction to Regression | [Regression](2-Regression/README.md) | Get started with Python and Scikit-Learn for Regression models | [lesson](2-Regression/1-Tools/README.md) | Jen |
| 05 | North American Pumpkin Prices 🎃 | [Regression](2-Regression/README.md) | Visualize and clean data in preparation for ML | [lesson](2-Regression/2-Data/README.md) | Jen |
| 06 | North American Pumpkin Prices 🎃 | [Regression](2-Regression/README.md) | Build Linear and Polynomial Regression models | [lesson](2-Regression/3-Linear/README.md) | Jen |
| 07 | North American Pumpkin Prices 🎃 | [Regression](2-Regression/README.md) | Build a Logistic Regression model | [lesson](2-Regression/4-Logistic/README.md) | Jen |
| 08 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a Web app to use your trained model | [lesson](3-Web-App/README.md) | Jen |
| 09 | Introduction to Classification | [Classification](4-Classification/README.md) | Clean, Prep, and Visualize your Data; Introduction to Classification | [lesson](4-Classification/1-Introduction/README.md) | Cassie |
| 10 | Delicious Asian and Indian Cuisines 🍜 | [Classification](4-Classification/README.md) | Build a Discriminative Model | [lesson](4-Classification/2-Descriminative/README.md) | Cassie |
| 11 | Delicious Asian and Indian Cuisines 🍜 | [Classification](4-Classification/README.md) | Build a Generative Model | [lesson](4-Classification/3-Generative/README.md) | Cassie |
| 12 | Delicious Asian and Indian Cuisines 🍜 | [Classification](4-Classification/README.md) | Build a Web App using your Model | [lesson](4-Classification/4-Applied/README.md) | Jen |
| 13 | Introduction to Clustering | [Clustering](5-Clustering/README.md) | Clean, Prep, and Visualize your Data; Introduction to Clustering | [lesson](5-Clustering/1-Visualize/README.md) | Jen |
| 15 | Introduction to Natural Language Processing ☕️ | [Natural Language Processing](6-NLP/README.md) | Learn the basics about NLP by building a simple bot | [lesson](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
| 16 | Common NLP Tasks ☕️ | [Natural Language Processing](6-NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [lesson](6-NLP/2-Tasks/README.md) | Stephen |
| 17 | Translation and Sentiment Analysis ❤️ | [Natural Language Processing](6-NLP/README.md) | Translation and Sentiment analysis with Jane Austen | [lesson](6-NLP/3-Translation-Sentiment/README.md) | Stephen |
| 18 | Romantic Hotels of Europe ♥️ | [Natural Language Processing](6-NLP/README.md) | Sentiment analysis, continued | [lesson]() | Stephen |
| 19 | Romantic Hotels of Europe ♥️ | [Natural Language Processing](6-NLP/README.md) | Sentiment analysis, continued | [lesson]() | Stephen |
| 20 | Introduction to Time Series Forecasting | [Time Series](7-TimeSeries/README.md) | Introduction to Time Series Forecasting | [lesson](7-TimeSeries/1-Introduction/README.md) | Francesca |
| 21 | ⚡️ World Power Usage ⚡️ Time Series Forecasting with ARIMA ⚡️ | [Time Series](7-TimeSeries/README.md) | Time Series Forecasting with ARIMA | [lesson](7-TimeSeries/2-ARIMA/README.md) | Francesca |
| 22 | Introduction to Reinforcement Learning | [Reinforcement Learning](8-Reinforcement/README.md) | Introduction to Reinforcement Learning with Q-Learning | [lesson](8-Reinforcement/1-QLearning/README.md) | Dmitry |
| 23 | Help Peter avoid the Wolf! 🐺 | [Reinforcement Learning](8-Reinforcement/README.md) | Reinforcement Learning Gym | [lesson](8-Reinforcement/2-Gym/README.md) | Dmitry |
| 24 | Real-World ML Scenarios and Applications | [ML in the Wild](9-Real-World/README.md) | Interesting and Revealing real-world applications of classical ML | [lesson](9-Real-World/1-Applications/README.md) | Team |
| 01 | [Introduction](1-Introduction/README.md) | Introduction to Machine Learning | Learn the basic concepts behind Machine Learning | [lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
| 02 | [Introduction](1-Introduction/README.md) | The History of Machine Learning | Learn the history underlying this field | [lesson](Introduction/2-history-of-ML/README.md) | Jen and Amy |
| 03 | [Introduction](1-Introduction/README.md) | Fairness and Machine Learning | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [lesson](1-Introduction/3-fairness/README.md) | Tomomi |
| 04 | [Introduction](1-Introduction/README.md) | Techniques for Machine Learning | What techniques do ML researchers use to build ML models? | [lesson](1-Introduction/4-techniques-of-ML/README.md) | Chris and Jen |
| 05 | Introduction to Regression | [Regression](2-Regression/README.md) | Get started with Python and Scikit-Learn for Regression models | [lesson](2-Regression/1-Tools/README.md) | Jen |
| 06 | North American Pumpkin Prices 🎃 | [Regression](2-Regression/README.md) | Visualize and clean data in preparation for ML | [lesson](2-Regression/2-Data/README.md) | Jen |
| 07 | North American Pumpkin Prices 🎃 | [Regression](2-Regression/README.md) | Build Linear and Polynomial Regression models | [lesson](2-Regression/3-Linear/README.md) | Jen |
| 08 | North American Pumpkin Prices 🎃 | [Regression](2-Regression/README.md) | Build a Logistic Regression model | [lesson](2-Regression/4-Logistic/README.md) | Jen |
| 09 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a Web app to use your trained model | [lesson](3-Web-App/README.md) | Jen |
| 10 | Introduction to Classification | [Classification](4-Classification/README.md) | Clean, Prep, and Visualize your Data; Introduction to Classification | [lesson](4-Classification/1-Introduction/README.md) | Jen and Cassie |
| 11 | Delicious Asian and Indian Cuisines 🍜 | [Classification](4-Classification/README.md) | Introduction to Classifiers | [lesson](4-Classification/2-Classifiers-1/README.md) | Jen and Cassie |
| 12 | Delicious Asian and Indian Cuisines 🍜 | [Classification](4-Classification/README.md) | More Classifiers | [lesson](4-Classification/3-Classifiers-2/README.md) | Jen and Cassie |
| 13 | Delicious Asian and Indian Cuisines 🍜 | [Classification](4-Classification/README.md) | Build a Recommender Web App using your Model | [lesson](4-Classification/4-Applied/README.md) | Jen |
| 14 | Introduction to Clustering | [Clustering](5-Clustering/README.md) | Clean, Prep, and Visualize your Data; Introduction to Clustering | [lesson](5-Clustering/1-Visualize/README.md) | Jen |
| 16 | Introduction to Natural Language Processing ☕️ | [Natural Language Processing](6-NLP/README.md) | Learn the basics about NLP by building a simple bot | [lesson](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
| 17 | Common NLP Tasks ☕️ | [Natural Language Processing](6-NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [lesson](6-NLP/2-Tasks/README.md) | Stephen |
| 18 | Translation and Sentiment Analysis ♥️ | [Natural Language Processing](6-NLP/README.md) | Translation and Sentiment analysis with Jane Austen | [lesson](6-NLP/3-Translation-Sentiment/README.md) | Stephen |
| 19 | Romantic Hotels of Europe ♥️ | [Natural Language Processing](6-NLP/README.md) | Sentiment analysis, continued | [lesson]() | Stephen |
| 20 | Introduction to Time Series Forecasting | [Time Series](7-TimeSeries/README.md) | Introduction to Time Series Forecasting | [lesson](7-TimeSeries/1-Introduction/README.md) | Francesca |
| 21 | ⚡️ World Power Usage ⚡️ Time Series Forecasting with ARIMA ⚡️ | [Time Series](7-TimeSeries/README.md) | Time Series Forecasting with ARIMA | [lesson](7-TimeSeries/2-ARIMA/README.md) | Francesca |
| 22 | Introduction to Reinforcement Learning | [Reinforcement Learning](8-Reinforcement/README.md) | Introduction to Reinforcement Learning with Q-Learning | [lesson](8-Reinforcement/1-QLearning/README.md) | Dmitry |
| 23 | Help Peter avoid the Wolf! 🐺 | [Reinforcement Learning](8-Reinforcement/README.md) | Reinforcement Learning Gym | [lesson](8-Reinforcement/2-Gym/README.md) | Dmitry |
| 24 | Real-World ML Scenarios and Applications | [ML in the Wild](9-Real-World/README.md) | Interesting and Revealing real-world applications of classical ML | [lesson](9-Real-World/1-Applications/README.md) | Team |
## Offline access
You can run this documentation offline by using [Docsify](https://docsify.js.org/#/). Fork this repo, [install Docsify](https://docsify.js.org/#/quickstart) on your local machine, and then in the root folder of this repo, type `docsify serve`. The website will be served on port 3000 on your localhost: `localhost:3000`.