@ -35,7 +35,7 @@ Travel with us around the world as we apply these classic techniques to data fro
---
**Teachers**, we have [included some suggestions](for-teachers.md) on how to use this curriculum. If you would like to create your own lessons, we have also included a [lesson template](lesson-template/README.md)
**Teachers**, we have [included some suggestions](for-teachers.md) on how to use this curriculum.
> Future space for Promo Video
@ -67,32 +67,32 @@ By ensuring that the content aligns with projects, the process is made more enga
> **A note about quizzes**: All quizzes are contained [in this app](https://jolly-sea-0a877260f.azurestaticapps.net), for 48 total quizzes of three questions each. They are linked from within the lessons but the quiz app can be run locally; follow the instruction in the `quiz-app` folder.
| 01 | [Introduction](Introduction/README.md) | Introduction to Machine Learning | Learn the basic concepts behind Machine Learning | [lesson](Introduction/1-intro-to-ML/README.md) | Team |
| 02 | [Introduction](Introduction/README.md) | The History of Machine Learning | Learn the history underlying this field | [lesson](Introduction/2-history-of-ML/README.md) | Jen and Amy |
| 03 | [Introduction](Introduction/README.md) | Fairness and Machine Learning | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [lesson](Introduction/3-fairness/README.md) | Tomomi |
| 04 | Introduction to Regression | [Regression](Regression/README.md) | Get started with Python and Scikit-Learn for Regression models | [lesson](Regression/1-Tools/README.md) | Jen |
| 05 | North American Pumpkin Prices 🎃 | [Regression](Regression/README.md) | Visualize and clean data in preparation for ML | [lesson](Regression/2-Data/README.md) | Jen |
| 06 | North American Pumpkin Prices 🎃 | [Regression](Regression/README.md) | Build Linear and Polynomial Regression models | [lesson](Regression/3-Linear/README.md) | Jen |
| 07 | North American Pumpkin Prices 🎃 | [Regression](Regression/README.md) | Build a Logistic Regression model | [lesson](Regression/4-Logistic/README.md) | Jen |
| 08 | A Web App 🔌 | [Web App](Web-App/README.md) | Build a Web app to use your trained model | [lesson](Web-App/README.md) | Jen |
| 09 | Introduction to Classification | [Classification](Classification/README.md) | Clean, Prep, and Visualize your Data; Introduction to Classification | [lesson](Classification/1-Data/README.md) | Cassie |
| 10 | Delicious Asian Recipes 🍜 | [Classification](Classification/README.md) | Build a Discriminative Model | [lesson](Classification/2-Descriminative/README.md) | Cassie |
| 11 | Delicious Asian Recipes 🍜 | [Classification](Classification/README.md) | Build a Generative Model | [lesson](Classification/3-Generative/README.md) | Cassie |
| 12 | Delicious Asian Recipes 🍜 | [Classification](Classification/README.md) | Build a Web App using your Model | [lesson](Classification/4-Applied/README.md) | Jen |
| 13 | Introduction to Clustering | [Clustering](Clustering/README.md) | Clean, Prep, and Visualize your Data; Introduction to Clustering | [lesson](Clustering/1-Visualize/README.md) | Jen |
| 15 | Introduction to Natural Language Processing ☕️ | [Natural Language Processing](NLP/README.md) | Learn the basics about NLP by building a simple bot | [lesson](NLP/1-Introduction-to-NLP/README.md) | Stephen |
| 16 | Common NLP Tasks ☕️ | [Natural Language Processing](NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [lesson](NLP/2-Tasks/README.md) | Stephen |
| 17 | Translation and Sentiment Analysis ❤️ | [Natural Language Processing](NLP/README.md) | Translation and Sentiment analysis with Jane Austen | [lesson](NLP/3-Translation-Sentiment/README.md) | Stephen |
| 18 | Romantic Hotels of Europe ♥️ | [Natural Language Processing](NLP/README.md) | Sentiment analysis, continued | [lesson]() | Stephen |
| 19 | Romantic Hotels of Europe ♥️ | [Natural Language Processing](NLP/README.md) | Sentiment analysis, continued | [lesson]() | Stephen |
| 20 | Introduction to Time Series Forecasting | [Time Series](TimeSeries/README.md) | Introduction to Time Series Forecasting | [lesson](TimeSeries/1-Introduction/README.md) | Francesca |
| 21 | ⚡️ World Power Usage ⚡️ Time Series Forecasting with ARIMA ⚡️ | [Time Series](TimeSeries/README.md) | Time Series Forecasting with ARIMA | [lesson](TimeSeries/2-ARIMA/README.md) | Francesca |
| 23 | Help Peter avoid the Wolf! 🐺 | [Reinforcement Learning](Reinforcement/README.md) | tbd | [lesson]() | Dmitry |
| 24 | Real-World ML Scenarios and Applications | ML in the Wild | Interesting and Revealing real-world applications of classical ML | [lesson](Real-World/1-Applications/README.md) | Team |
| 01 | [Introduction](Introduction/README.md) | Introduction to Machine Learning | Learn the basic concepts behind Machine Learning | [lesson](Introduction/1-intro-to-ML/README.md) | Team |
| 02 | [Introduction](Introduction/README.md) | The History of Machine Learning | Learn the history underlying this field | [lesson](Introduction/2-history-of-ML/README.md) | Jen and Amy |
| 03 | [Introduction](Introduction/README.md) | Fairness and Machine Learning | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [lesson](Introduction/3-fairness/README.md) | Tomomi |
| 04 | Introduction to Regression | [Regression](Regression/README.md) | Get started with Python and Scikit-Learn for Regression models | [lesson](Regression/1-Tools/README.md) | Jen |
| 05 | North American Pumpkin Prices 🎃 | [Regression](Regression/README.md) | Visualize and clean data in preparation for ML | [lesson](Regression/2-Data/README.md) | Jen |
| 06 | North American Pumpkin Prices 🎃 | [Regression](Regression/README.md) | Build Linear and Polynomial Regression models | [lesson](Regression/3-Linear/README.md) | Jen |
| 07 | North American Pumpkin Prices 🎃 | [Regression](Regression/README.md) | Build a Logistic Regression model | [lesson](Regression/4-Logistic/README.md) | Jen |
| 08 | A Web App 🔌 | [Web App](Web-App/README.md) | Build a Web app to use your trained model | [lesson](Web-App/README.md) | Jen |
| 09 | Introduction to Classification | [Classification](Classification/README.md) | Clean, Prep, and Visualize your Data; Introduction to Classification | [lesson](Classification/1-Data/README.md) | Cassie |
| 10 | Delicious Asian Recipes 🍜 | [Classification](Classification/README.md) | Build a Discriminative Model | [lesson](Classification/2-Descriminative/README.md) | Cassie |
| 11 | Delicious Asian Recipes 🍜 | [Classification](Classification/README.md) | Build a Generative Model | [lesson](Classification/3-Generative/README.md) | Cassie |
| 12 | Delicious Asian Recipes 🍜 | [Classification](Classification/README.md) | Build a Web App using your Model | [lesson](Classification/4-Applied/README.md) | Jen |
| 13 | Introduction to Clustering | [Clustering](Clustering/README.md) | Clean, Prep, and Visualize your Data; Introduction to Clustering | [lesson](Clustering/1-Visualize/README.md) | Jen |
| 15 | Introduction to Natural Language Processing ☕️ | [Natural Language Processing](NLP/README.md) | Learn the basics about NLP by building a simple bot | [lesson](NLP/1-Introduction-to-NLP/README.md) | Stephen |
| 16 | Common NLP Tasks ☕️ | [Natural Language Processing](NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [lesson](NLP/2-Tasks/README.md) | Stephen |
| 17 | Translation and Sentiment Analysis ❤️ | [Natural Language Processing](NLP/README.md) | Translation and Sentiment analysis with Jane Austen | [lesson](NLP/3-Translation-Sentiment/README.md) | Stephen |
| 18 | Romantic Hotels of Europe ♥️ | [Natural Language Processing](NLP/README.md) | Sentiment analysis, continued | [lesson]() | Stephen |
| 19 | Romantic Hotels of Europe ♥️ | [Natural Language Processing](NLP/README.md) | Sentiment analysis, continued | [lesson]() | Stephen |
| 20 | Introduction to Time Series Forecasting | [Time Series](TimeSeries/README.md) | Introduction to Time Series Forecasting | [lesson](TimeSeries/1-Introduction/README.md) | Francesca |
| 21 | ⚡️ World Power Usage ⚡️ Time Series Forecasting with ARIMA ⚡️ | [Time Series](TimeSeries/README.md) | Time Series Forecasting with ARIMA | [lesson](TimeSeries/2-ARIMA/README.md) | Francesca |
| 23 | Help Peter avoid the Wolf! 🐺 | [Reinforcement Learning](Reinforcement/README.md) | tbd | [lesson]() | Dmitry |
| 24 | Real-World ML Scenarios and Applications | ML in the Wild | Interesting and Revealing real-world applications of classical ML | [lesson](Real-World/1-Applications/README.md) | Team |
## Offline access
You can run this documentation offline by using [Docsify](https://docsify.js.org/#/). Fork this repo, [install Docsify](https://docsify.js.org/#/quickstart) on your local machine, and then in the root folder of this repo, type `docsify serve`. The website will be served on port 3000 on your localhost: `localhost:3000`.
In this curriculum, you have learned many ways to prepare data for training and create machine learning models. You built a series of classic Regression, Clustering, Classification, Natural Language Processing, and Time Series models. Congratulations! Now, you might be wondering what it's all for...what are the real world applications for these models?
While a lot of interest in industry has been garnered by AI, which usually leverages Deep Learning, there are still valuable applications for classical machine learning models, some of which you use today, although you might not be aware of it. In this lesson, you'll explore how eight different industries and subject-matter domains use these types of models to make their applications more performant, reliable, intelligent, and thus more valuable to users.
@ -13,16 +13,18 @@ One of the major consumers of classical machine learning models is the finance i
We learned about [k-means clustering](Clustering/2-K-Means/README.md) earlier in the course, but how can it be used to solve problems related to credit card fraud?
K-means clustering comes in handy during a credit card fraud detection technique called **outlier detection**. Outliers, or deviations in observations about a set of data, can tell us if a credit card is being used in a normal capacity, or if something unusual is going on. As shown in [this paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.1195&rep=rep1&type=pdf), you can sort credit card data using a k-means clustering algorithm and assign each transaction to a cluster based on how much of an outlier it appears to be. Then, you can evaluate for riskiest cluster for fraudulent versus legitimate transactions.
K-means clustering comes in handy during a credit card fraud detection technique called **outlier detection**. Outliers, or deviations in observations about a set of data, can tell us if a credit card is being used in a normal capacity or if something unusual is going on. As shown in the paper below, you can sort credit card data using a k-means clustering algorithm and assign each transaction to a cluster based on how much of an outlier it appears to be. Then, you can evaluate for riskiest cluster for fraudulent versus legitimate transactions.
In wealth management, an individual or firm handles investments on behalf of their clients. Their job is to sustain and grow wealth in the long-term, so it is essential to choose investments that perform well.
One way to evaluate how a particular investment performs is through statistical regression. [Linear regression](Regression/1-Tools/README.md) is a valuable tool for understanding how a fund performs relative to some benchmark. We can also deduce whether or not the results of the regression are statistically significant, or how much they would affect a client's investments. You could even further expand your analysis using multiple regression, where additional risk factors can be taken into account. For an example of how this would work for a specific fund, check out [this paper](http://www.brightwoodventures.com/evaluating-fund-performance-using-regression/) on evaluating fund performance using regression.
One way to evaluate how a particular investment performs is through statistical regression. [Linear regression](Regression/1-Tools/README.md) is a valuable tool for understanding how a fund performs relative to some benchmark. We can also deduce whether or not the results of the regression are statistically significant, or how much they would affect a client's investments. You could even further expand your analysis using multiple regression, where additional risk factors can be taken into account. For an example of how this would work for a specific fund, check out the paper below on evaluating fund performance using regression.
[Coursera](https://coursera.com) has a great tech blog where they discuss many engineering decisions. In this case study, they plotted a regression line to try to explore any correlation between a low NPS (Net Promoter Score) rating and course retention or drop-off.
You learned about Reinforcement Learning in previous lessons. It can be very useful when trying to predict patterns in nature. In particular, it could be used to track ecological problems like forest fires and the spread of invasive species. In Canada, a group of researchers used Reinforcement Learning to build forest wildfire dynamics models from satellite images. Using an innovative "spatially spreading process (SSP)", they envisioned a forest fire as "the agent at any cell in the landscape". "The set of actions the fire can take from a location at any point in time includes spreading north, south, east, or west or not spreading.
This approach inverts the usual RL setup since the dynamics of the corresponding Markov Decision Process (MDP) is a known function for immediate wildfire spread." Read more about the classic algorithms used by this group in this article: https://www.frontiersin.org/articles/10.3389/fict.2018.00006/full
This approach inverts the usual RL setup since the dynamics of the corresponding Markov Decision Process (MDP) is a known function for immediate wildfire spread." Read more about the classic algorithms used by this group.
While deep learning has created a revolution in visually-tracking animal movements (you can build your own [polar bear tracker](https://docs.microsoft.com/en-us/learn/modules/build-ml-model-with-azure-stream-analytics/) here), classic ML still has a place in this task.
While deep learning has created a revolution in visually-tracking animal movements (you can build your own [polar bear tracker](https://docs.microsoft.com/learn/modules/build-ml-model-with-azure-stream-analytics/?WT.mc_id=academic-15963-cxa) here), classic ML still has a place in this task.
Sensors to track movements of farm animals and IoT makes use of this type of visual processing, but more basic ML techniques are useful to preprocess data. For example, in this paper, sheep postures were monitored and analyzed using various classifier algorithms. You will recocgnize the ROC curve on p. 335: https://druckhaus-hofmann.de/gallery/31-wj-feb-2020.pdf
Sensors to track movements of farm animals and IoT makes use of this type of visual processing, but more basic ML techniques are useful to preprocess data. For example, in this paper, sheep postures were monitored and analyzed using various classifier algorithms. You will recocgnize the ROC curve on p. 335.
In our lesson on Time Series, we invoked the concept of smart parking meters to generate revenue for a town based on understanding supply and demand. This article discusses in detail how clustering, regression and time series forecasting combined to help predict future energy use in Ireland, based off of smart metering: https://www-cdn.knime.com/sites/default/files/inline-images/knime_bigdata_energy_timeseries_whitepaper.pdf
In our lesson on Time Series, we invoked the concept of smart parking meters to generate revenue for a town based on understanding supply and demand. This article discusses in detail how clustering, regression and time series forecasting combined to help predict future energy use in Ireland, based off of smart metering.
Detecting fake news has become a game of cat and mouse in today's media. In this article, researchers suggest that a system combining several of the ML techniques we have studied can be tested and the best model deployed: "This system is based on natural language processing to extract features from the data and then these features are used for the training of machine learning classifiers such as Naive Bayes, Support Vector Machine (SVM), Random Forest (RF), Stochastic Gradient Descent (SGD), and Logistic Regression(LR)."