Merge pull request #663 from Vidushi-Gupta/main

Fixed hyperlinks and knitted the Rmd files to html
pull/668/head
Carlotta Castelluccio 2 years ago committed by GitHub
commit a4285e6e58
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -18,7 +18,7 @@ In this lesson, we'll explore a variety of classifiers to *predict a given natio
### **Preparation**
This lesson builds up on our [previous lesson](https://github.com/microsoft/ML-For-Beginners/blob/main/4-Classification/1-Introduction/solution/lesson_10-R.ipynb) where we:
This lesson builds up on our [previous lesson](https://github.com/microsoft/ML-For-Beginners/blob/main/4-Classification/1-Introduction/solution/R/lesson_10.html) where we:
- Made a gentle introduction to classifications using a dataset about all the brilliant cuisines of Asia and India 😋.

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

@ -99,7 +99,7 @@ There are over 100 clustering algorithms, and their use depends on the nature of
Clustering as a technique is greatly aided by proper visualization, so let's get started by visualizing our music data. This exercise will help us decide which of the methods of clustering we should most effectively use for the nature of this data.
1. Open the _notebook.ipynb_ file in this folder.
1. Open the [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/1-Visualize/notebook.ipynb) file in this folder.
1. Import the `Seaborn` package for good data visualization.
@ -107,7 +107,7 @@ Clustering as a technique is greatly aided by proper visualization, so let's get
!pip install seaborn
```
1. Append the song data from _nigerian-songs.csv_. Load up a dataframe with some data about the songs. Get ready to explore this data by importing the libraries and dumping out the data:
1. Append the song data from [_nigerian-songs.csv_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/data/nigerian-songs.csv). Load up a dataframe with some data about the songs. Get ready to explore this data by importing the libraries and dumping out the data:
```python
import matplotlib.pyplot as plt

File diff suppressed because one or more lines are too long

@ -1,9 +1,5 @@
# K-Means clustering
[![Andrew Ng explains Clustering](https://img.youtube.com/vi/hDmNF9JG3lo/0.jpg)](https://youtu.be/hDmNF9JG3lo "Andrew Ng explains Clustering")
> 🎥 Click the image above for a video: Andrew Ng explains clustering
## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/29/)
In this lesson, you will learn how to create clusters using Scikit-learn and the Nigerian music dataset you imported earlier. We will cover the basics of K-Means for Clustering. Keep in mind that, as you learned in the earlier lesson, there are many ways to work with clusters and the method you use depends on your data. We will try K-Means as it's the most common clustering technique. Let's get started!
@ -36,7 +32,7 @@ One drawback of using K-Means includes the fact that you will need to establish
## Prerequisite
You will work in this lesson's _notebook.ipynb_ file that includes the data import and preliminary cleaning you did in the last lesson.
You will work in this lesson's [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/2-K-Means/notebook.ipynb) file that includes the data import and preliminary cleaning you did in the last lesson.
## Exercise - preparation
@ -134,7 +130,7 @@ You see an array printed out with predicted clusters (0, 1,or 2) for each row of
## Silhouette score
Look for a silhouette score closer to 1. This score varies from -1 to 1, and if the score is 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters.[source](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam).
Look for a silhouette score closer to 1. This score varies from -1 to 1, and if the score is 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters. [(Source)](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam)
Our score is **.53**, so right in the middle. This indicates that our data is not particularly well-suited to this type of clustering, but let's continue.
@ -157,11 +153,11 @@ Our score is **.53**, so right in the middle. This indicates that our data is no
> 🎓 range: These are the iterations of the clustering process
> 🎓 random_state: "Determines random number generation for centroid initialization."[source](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)
> 🎓 random_state: "Determines random number generation for centroid initialization." [Source](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)
> 🎓 WCSS: "within-cluster sums of squares" measures the squared average distance of all the points within a cluster to the cluster centroid.[source](https://medium.com/@ODSC/unsupervised-learning-evaluating-clusters-bd47eed175ce).
> 🎓 WCSS: "within-cluster sums of squares" measures the squared average distance of all the points within a cluster to the cluster centroid. [Source](https://medium.com/@ODSC/unsupervised-learning-evaluating-clusters-bd47eed175ce).
> 🎓 Inertia: K-Means algorithms attempt to choose centroids to minimize 'inertia', "a measure of how internally coherent clusters are."[source](https://scikit-learn.org/stable/modules/clustering.html). The value is appended to the wcss variable on each iteration.
> 🎓 Inertia: K-Means algorithms attempt to choose centroids to minimize 'inertia', "a measure of how internally coherent clusters are." [Source](https://scikit-learn.org/stable/modules/clustering.html). The value is appended to the wcss variable on each iteration.
> 🎓 k-means++: In [Scikit-learn](https://scikit-learn.org/stable/modules/clustering.html#k-means) you can use the 'k-means++' optimization, which "initializes the centroids to be (generally) distant from each other, leading to probably better results than random initialization.
@ -224,7 +220,7 @@ Previously, you surmised that, because you have targeted 3 song genres, you shou
## Variance
Variance is defined as "the average of the squared differences from the Mean" [source](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean.
Variance is defined as "the average of the squared differences from the Mean" [(Source)](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean.
✅ This is a great moment to think about all the ways you could correct this issue. Tweak the data a bit more? Use different columns? Use a different algorithm? Hint: Try [scaling your data](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/) to normalize it and test other columns.

@ -206,7 +206,7 @@ Perfect, we have just partitioned our data set into a set of 3 groups. So, how g
### **Silhouette score**
[Silhouette analysis](https://en.wikipedia.org/wiki/Silhouette_(clustering)) can be used to study the separation distance between the resulting clusters. This score varies from -1 to 1, and if the score is near 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters.[source](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam).
[Silhouette analysis](https://en.wikipedia.org/wiki/Silhouette_(clustering)) can be used to study the separation distance between the resulting clusters. This score varies from -1 to 1, and if the score is near 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters. [(Source)](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam).
The average silhouette method computes the average silhouette of observations for different values of *k*. A high average silhouette score indicates a good clustering.
@ -339,7 +339,7 @@ In Scikit-learn's documentation, you can see that a model like this one, with cl
## **Variance**
Variance is defined as "the average of the squared differences from the Mean" [source](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean.
Variance is defined as "the average of the squared differences from the Mean" [(Source)](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean.
✅ This is a great moment to think about all the ways you could correct this issue. Tweak the data a bit more? Use different columns? Use a different algorithm? Hint: Try [scaling your data](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/) to normalize it and test other columns.

File diff suppressed because one or more lines are too long

@ -133,7 +133,7 @@ Let's create the bot next. We'll start by defining some phrases.
It was nice talking to you, goodbye!
```
One possible solution to the task is [here](solution/bot.py)
One possible solution to the task is [here](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/1-Introduction-to-NLP/solution/bot.py)
✅ Stop and consider

@ -187,7 +187,7 @@ Hmm, that's not great. Can you tell me more about old hounddogs?
It was nice talking to you, goodbye!
```
One possible solution to the task is [here](solution/bot.py)
One possible solution to the task is [here](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/2-Tasks/solution/bot.py)
✅ Knowledge Check

@ -143,7 +143,7 @@ Your task is to determine, using sentiment polarity, if *Pride and Prejudice* ha
1. If the polarity is 1 or -1 store the sentence in an array or list of positive or negative messages
5. At the end, print out all the positive sentences and negative sentences (separately) and the number of each.
Here is a sample [solution](solution/notebook.ipynb).
Here is a sample [solution](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/3-Translation-Sentiment/solution/notebook.ipynb).
✅ Knowledge Check

@ -202,7 +202,7 @@ Finally, and this is delightful (because it didn't take much processing at all),
| Family with older children | 26349 |
| With a pet | 1405 |
You could argue that `Travellers with friends` is the same as `Group` more or less, and that would be fair to combine the two as above. The code for identifying the correct tags is [the Tags notebook](solution/1-notebook.ipynb).
You could argue that `Travellers with friends` is the same as `Group` more or less, and that would be fair to combine the two as above. The code for identifying the correct tags is [the Tags notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb).
The final step is to create new columns for each of these tags. Then, for every review row, if the `Tag` column matches one of the new columns, add a 1, if not, add a 0. The end result will be a count of how many reviewers chose this hotel (in aggregate) for, say, business vs leisure, or to bring a pet to, and this is useful information when recommending a hotel.
@ -347,13 +347,13 @@ print("Saving results to Hotel_Reviews_NLP.csv")
df.to_csv(r"../data/Hotel_Reviews_NLP.csv", index = False)
```
You should run the entire code for [the analysis notebook](solution/3-notebook.ipynb) (after you've run [your filtering notebook](solution/1-notebook.ipynb) to generate the Hotel_Reviews_Filtered.csv file).
You should run the entire code for [the analysis notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb) (after you've run [your filtering notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb) to generate the Hotel_Reviews_Filtered.csv file).
To review, the steps are:
1. Original dataset file **Hotel_Reviews.csv** is explored in the previous lesson with [the explorer notebook](../4-Hotel-Reviews-1/solution/notebook.ipynb)
2. Hotel_Reviews.csv is filtered by [the filtering notebook](solution/1-notebook.ipynb) resulting in **Hotel_Reviews_Filtered.csv**
3. Hotel_Reviews_Filtered.csv is processed by [the sentiment analysis notebook](solution/3-notebook.ipynb) resulting in **Hotel_Reviews_NLP.csv**
1. Original dataset file **Hotel_Reviews.csv** is explored in the previous lesson with [the explorer notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/4-Hotel-Reviews-1/solution/notebook.ipynb)
2. Hotel_Reviews.csv is filtered by [the filtering notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb) resulting in **Hotel_Reviews_Filtered.csv**
3. Hotel_Reviews_Filtered.csv is processed by [the sentiment analysis notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb) resulting in **Hotel_Reviews_NLP.csv**
4. Use Hotel_Reviews_NLP.csv in the NLP Challenge below
### Conclusion

@ -71,9 +71,9 @@ In the next lesson, you will build an ARIMA model using [Univariate Time Series]
✅ Identify the variable that changes over time in this dataset
## Time Series [data characteristics](https://online.stat.psu.edu/stat510/lesson/1/1.1) to consider
## Time Series data characteristics to consider
When looking at time series data, you might notice that it has certain characteristics that you need to take into account and mitigate to better understand its patterns. If you consider time series data as potentially providing a 'signal' that you want to analyze, these characteristics can be thought of as 'noise'. You often will need to reduce this 'noise' by offsetting some of these characteristics using some statistical techniques.
When looking at time series data, you might notice that it has [certain characteristics](https://online.stat.psu.edu/stat510/lesson/1/1.1) that you need to take into account and mitigate to better understand its patterns. If you consider time series data as potentially providing a 'signal' that you want to analyze, these characteristics can be thought of as 'noise'. You often will need to reduce this 'noise' by offsetting some of these characteristics using some statistical techniques.
Here are some concepts you should know to be able to work with time series:

@ -34,7 +34,7 @@ Bottom line: ARIMA is used to make a model fit the special form of time series d
## Exercise - build an ARIMA model
Open the _/working_ folder in this lesson and find the _notebook.ipynb_ file.
Open the [_/working_](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA/working) folder in this lesson and find the [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/7-TimeSeries/2-ARIMA/working/notebook.ipynb) file.
1. Run the notebook to load the `statsmodels` Python library; you will need this for ARIMA models.

@ -24,7 +24,7 @@ In the last lesson you learned about ARIMA, which is a very successful statistic
The first few steps for data preparation are the same as that of the previous lesson on [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA).
Open the _/working_ folder in this lesson and find the _notebook.ipynb_ file.[^2]
Open the [_/working_](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/3-SVR/working) folder in this lesson and find the [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/7-TimeSeries/3-SVR/working/notebook.ipynb) file.[^2]
1. Run the notebook and import the necessary libraries: [^2]
@ -383,4 +383,4 @@ This lesson was to introduce the application of SVR for Time Series Forecasting.
[^1]: The text, code and output in this section was contributed by [@AnirbanMukherjeeXD](https://github.com/AnirbanMukherjeeXD)
[^2]: The text, code and output in this section was taken from [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA)
[^2]: The text, code and output in this section was taken from [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA)

@ -17,9 +17,9 @@ By using reinforcement learning and a simulator (the game), you can learn how to
In this lesson, we will be experimenting with some code in Python. You should be able to run the Jupyter Notebook code from this lesson, either on your computer or somewhere in the cloud.
You can open [the lesson notebook](notebook.ipynb) and walk through this lesson to build.
You can open [the lesson notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/notebook.ipynb) and walk through this lesson to build.
> **Note:** If you are opening this code from the cloud, you also need to fetch the [`rlboard.py`](rlboard.py) file, which is used in the notebook code. Add it to the same directory as the notebook.
> **Note:** If you are opening this code from the cloud, you also need to fetch the [`rlboard.py`](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/rlboard.py) file, which is used in the notebook code. Add it to the same directory as the notebook.
## Introduction
@ -41,7 +41,7 @@ Each cell in this board can either be:
* an **apple**, which represents something Peter would be glad to find in order to feed himself.
* a **wolf**, which is dangerous and should be avoided.
There is a separate Python module, [`rlboard.py`](rlboard.py), which contains the code to work with this environment. Because this code is not important for understanding our concepts, we will import the module and use it to create the sample board (code block 1):
There is a separate Python module, [`rlboard.py`](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/rlboard.py), which contains the code to work with this environment. Because this code is not important for understanding our concepts, we will import the module and use it to create the sample board (code block 1):
```python
from rlboard import *
@ -186,7 +186,7 @@ Suppose we are now at the state *s*, and we want to move to the next state *s'*.
This gives the **Bellman formula** for calculating the value of the Q-Table at state *s*, given action *a*:
<img src="images/bellmaneq.gif"/>
<img src="images/bellman-equation.png"/>
Here γ is the so-called **discount factor** that determines to which extent you should prefer the current reward over the future reward and vice versa.
@ -316,4 +316,5 @@ Overall, it is important to remember that the success and quality of the learnin
## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/46/)
## Assignment [A More Realistic World](assignment.md)
## Assignment
[A More Realistic World](assignment.md)

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

@ -1,7 +1,7 @@
# CartPole Skating
The problem we have been solving in the previous lesson might seem like a toy problem, not really applicable for real life scenarios. This is not the case, because many real world problems also share this scenario - including playing Chess or Go. They are similar, because we also have a board with given rules and a **discrete state**.
https://white-water-09ec41f0f.azurestaticapps.net/
## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/47/)
## Introduction
@ -331,10 +331,11 @@ You should see something like this:
## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/48/)
## Assignment: [Train a Mountain Car](assignment.md)
## Assignment
[Train a Mountain Car](assignment.md)
## Conclusion
We have now learned how to train agents to achieve good results just by providing them a reward function that defines the desired state of the game, and by giving them an opportunity to intelligently explore the search space. We have successfully applied the Q-Learning algorithm in the cases of discrete and continuous environments, but with discrete actions.
It's important to also study situations where action state is also continuous, and when observation space is much more complex, such as the image from the Atari game screen. In those problems we often need to use more powerful machine learning techniques, such as neural networks, in order to achieve good results. Those more advanced topics are the subject of our forthcoming more advanced AI course.
It's important to also study situations where action state is also continuous, and when observation space is much more complex, such as the image from the Atari game screen. In those problems we often need to use more powerful machine learning techniques, such as neural networks, in order to achieve good results. Those more advanced topics are the subject of our forthcoming more advanced AI course.

@ -19,16 +19,14 @@ The finance sector offers many opportunities for machine learning. Many problems
We learned about [k-means clustering](../../5-Clustering/2-K-Means/README.md) earlier in the course, but how can it be used to solve problems related to credit card fraud?
K-means clustering comes in handy during a credit card fraud detection technique called **outlier detection**. Outliers, or deviations in observations about a set of data, can tell us if a credit card is being used in a normal capacity or if something unusual is going on. As shown in the paper linked below, you can sort credit card data using a k-means clustering algorithm and assign each transaction to a cluster based on how much of an outlier it appears to be. Then, you can evaluate the riskiest clusters for fraudulent versus legitimate transactions.
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.1195&rep=rep1&type=pdf
[Reference](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.1195&rep=rep1&type=pdf)
### Wealth management
In wealth management, an individual or firm handles investments on behalf of their clients. Their job is to sustain and grow wealth in the long-term, so it is essential to choose investments that perform well.
One way to evaluate how a particular investment performs is through statistical regression. [Linear regression](../../2-Regression/1-Tools/README.md) is a valuable tool for understanding how a fund performs relative to some benchmark. We can also deduce whether or not the results of the regression are statistically significant, or how much they would affect a client's investments. You could even further expand your analysis using multiple regression, where additional risk factors can be taken into account. For an example of how this would work for a specific fund, check out the paper below on evaluating fund performance using regression.
http://www.brightwoodventures.com/evaluating-fund-performance-using-regression/
[Reference](http://www.brightwoodventures.com/evaluating-fund-performance-using-regression/)
## 🎓 Education
@ -37,14 +35,12 @@ The educational sector is also a very interesting area where ML can be applied.
### Predicting student behavior
[Coursera](https://coursera.com), an online open course provider, has a great tech blog where they discuss many engineering decisions. In this case study, they plotted a regression line to try to explore any correlation between a low NPS (Net Promoter Score) rating and course retention or drop-off.
https://medium.com/coursera-engineering/controlled-regression-quantifying-the-impact-of-course-quality-on-learner-retention-31f956bd592a
[Reference](https://medium.com/coursera-engineering/controlled-regression-quantifying-the-impact-of-course-quality-on-learner-retention-31f956bd592a)
### Mitigating bias
[Grammarly](https://grammarly.com), a writing assistant that checks for spelling and grammar errors, uses sophisticated [natural language processing systems](../../6-NLP/README.md) throughout its products. They published an interesting case study in their tech blog about how they dealt with gender bias in machine learning, which you learned about in our [introductory fairness lesson](../../1-Introduction/3-fairness/README.md).
https://www.grammarly.com/blog/engineering/mitigating-gender-bias-in-autocorrect/
[Reference](https://www.grammarly.com/blog/engineering/mitigating-gender-bias-in-autocorrect/)
## 👜 Retail
@ -53,14 +49,12 @@ The retail sector can definitely benefit from the use of ML, with everything fro
### Personalizing the customer journey
At Wayfair, a company that sells home goods like furniture, helping customers find the right products for their taste and needs is paramount. In this article, engineers from the company describe how they use ML and NLP to "surface the right results for customers". Notably, their Query Intent Engine has been built to use entity extraction, classifier training, asset and opinion extraction, and sentiment tagging on customer reviews. This is a classic use case of how NLP works in online retail.
https://www.aboutwayfair.com/tech-innovation/how-we-use-machine-learning-and-natural-language-processing-to-empower-search
[Reference](https://www.aboutwayfair.com/tech-innovation/how-we-use-machine-learning-and-natural-language-processing-to-empower-search)
### Inventory management
Innovative, nimble companies like [StitchFix](https://stitchfix.com), a box service that ships clothing to consumers, rely heavily on ML for recommendations and inventory management. Their styling teams work together with their merchandising teams, in fact: "one of our data scientists tinkered with a genetic algorithm and applied it to apparel to predict what would be a successful piece of clothing that doesn't exist today. We brought that to the merchandise team and now they can use that as a tool."
https://www.zdnet.com/article/how-stitch-fix-uses-machine-learning-to-master-the-science-of-styling/
[Reference](https://www.zdnet.com/article/how-stitch-fix-uses-machine-learning-to-master-the-science-of-styling/)
## 🏥 Health Care
@ -69,20 +63,17 @@ The health care sector can leverage ML to optimize research tasks and also logis
### Managing clinical trials
Toxicity in clinical trials is a major concern to drug makers. How much toxicity is tolerable? In this study, analyzing various clinical trial methods led to the development of a new approach for predicting the odds of clinical trial outcomes. Specifically, they were able to use random forest to produce a [classifier](../../4-Classification/README.md) that is able to distinguish between groups of drugs.
https://www.sciencedirect.com/science/article/pii/S2451945616302914
[Reference](https://www.sciencedirect.com/science/article/pii/S2451945616302914)
### Hospital readmission management
Hospital care is costly, especially when patients have to be readmitted. This paper discusses a company that uses ML to predict readmission potential using [clustering](../../5-Clustering/README.md) algorithms. These clusters help analysts to "discover groups of readmissions that may share a common cause".
https://healthmanagement.org/c/healthmanagement/issuearticle/hospital-readmissions-and-machine-learning
[Reference](https://healthmanagement.org/c/healthmanagement/issuearticle/hospital-readmissions-and-machine-learning)
### Disease management
The recent pandemic has shone a bright light on the ways that machine learning can aid in stopping the spread of disease. In this article, you'll recognize the use of ARIMA, logistic curves, linear regression, and SARIMA. "This work is an attempt to calculate the rate of spread of this virus and thus to predict the deaths, recoveries, and confirmed cases, so that it may help us to prepare better and survive."
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7979218/
[Reference](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7979218/)
## 🌲 Ecology and Green Tech
@ -93,22 +84,19 @@ Nature and ecology consists of many sensitive systems where the interplay betwee
You learned about [Reinforcement Learning](../../8-Reinforcement/README.md) in previous lessons. It can be very useful when trying to predict patterns in nature. In particular, it can be used to track ecological problems like forest fires and the spread of invasive species. In Canada, a group of researchers used Reinforcement Learning to build forest wildfire dynamics models from satellite images. Using an innovative "spatially spreading process (SSP)", they envisioned a forest fire as "the agent at any cell in the landscape." "The set of actions the fire can take from a location at any point in time includes spreading north, south, east, or west or not spreading.
This approach inverts the usual RL setup since the dynamics of the corresponding Markov Decision Process (MDP) is a known function for immediate wildfire spread." Read more about the classic algorithms used by this group at the link below.
https://www.frontiersin.org/articles/10.3389/fict.2018.00006/full
[Reference](https://www.frontiersin.org/articles/10.3389/fict.2018.00006/full)
### Motion sensing of animals
While deep learning has created a revolution in visually tracking animal movements (you can build your own [polar bear tracker](https://docs.microsoft.com/learn/modules/build-ml-model-with-azure-stream-analytics/?WT.mc_id=academic-77952-leestott) here), classic ML still has a place in this task.
Sensors to track movements of farm animals and IoT make use of this type of visual processing, but more basic ML techniques are useful to preprocess data. For example, in this paper, sheep postures were monitored and analyzed using various classifier algorithms. You might recognize the ROC curve on page 335.
https://druckhaus-hofmann.de/gallery/31-wj-feb-2020.pdf
[Reference](https://druckhaus-hofmann.de/gallery/31-wj-feb-2020.pdf)
### ⚡️ Energy Management
In our lessons on [time series forecasting](../../7-TimeSeries/README.md), we invoked the concept of smart parking meters to generate revenue for a town based on understanding supply and demand. This article discusses in detail how clustering, regression and time series forecasting combined to help predict future energy use in Ireland, based off of smart metering.
https://www-cdn.knime.com/sites/default/files/inline-images/knime_bigdata_energy_timeseries_whitepaper.pdf
[Reference](https://www-cdn.knime.com/sites/default/files/inline-images/knime_bigdata_energy_timeseries_whitepaper.pdf)
## 💼 Insurance
@ -117,8 +105,7 @@ The insurance sector is another sector that uses ML to construct and optimize vi
### Volatility Management
MetLife, a life insurance provider, is forthcoming with the way they analyze and mitigate volatility in their financial models. In this article you'll notice binary and ordinal classification visualizations. You'll also discover forecasting visualizations.
https://investments.metlife.com/content/dam/metlifecom/us/investments/insights/research-topics/macro-strategy/pdf/MetLifeInvestmentManagement_MachineLearnedRanking_070920.pdf
[Reference](https://investments.metlife.com/content/dam/metlifecom/us/investments/insights/research-topics/macro-strategy/pdf/MetLifeInvestmentManagement_MachineLearnedRanking_070920.pdf)
## 🎨 Arts, Culture, and Literature
@ -127,8 +114,7 @@ In the arts, for example in journalism, there are many interesting problems. Det
### Fake news detection
Detecting fake news has become a game of cat and mouse in today's media. In this article, researchers suggest that a system combining several of the ML techniques we have studied can be tested and the best model deployed: "This system is based on natural language processing to extract features from the data and then these features are used for the training of machine learning classifiers such as Naive Bayes, Support Vector Machine (SVM), Random Forest (RF), Stochastic Gradient Descent (SGD), and Logistic Regression(LR)."
https://www.irjet.net/archives/V7/i6/IRJET-V7I6688.pdf
[Reference](https://www.irjet.net/archives/V7/i6/IRJET-V7I6688.pdf)
This article shows how combining different ML domains can produce interesting results that can help stop fake news from spreading and creating real damage; in this case, the impetus was the spread of rumors about COVID treatments that incited mob violence.
@ -137,16 +123,14 @@ This article shows how combining different ML domains can produce interesting re
Museums are at the cusp of an AI revolution in which cataloging and digitizing collections and finding links between artifacts is becoming easier as technology advances. Projects such as [In Codice Ratio](https://www.sciencedirect.com/science/article/abs/pii/S0306457321001035#:~:text=1.,studies%20over%20large%20historical%20sources.) are helping unlock the mysteries of inaccessible collections such as the Vatican Archives. But, the business aspect of museums benefits from ML models as well.
For example, the Art Institute of Chicago built models to predict what audiences are interested in and when they will attend expositions. The goal is to create individualized and optimized visitor experiences each time the user visits the museum. "During fiscal 2017, the model predicted attendance and admissions within 1 percent of accuracy, says Andrew Simnick, senior vice president at the Art Institute."
https://www.chicagobusiness.com/article/20180518/ISSUE01/180519840/art-institute-of-chicago-uses-data-to-make-exhibit-choices
[Reference](https://www.chicagobusiness.com/article/20180518/ISSUE01/180519840/art-institute-of-chicago-uses-data-to-make-exhibit-choices)
## 🏷 Marketing
### Customer segmentation
The most effective marketing strategies target customers in different ways based on various groupings. In this article, the uses of Clustering algorithms are discussed to support differentiated marketing. Differentiated marketing helps companies improve brand recognition, reach more customers, and make more money.
https://ai.inqline.com/machine-learning-for-marketing-customer-segmentation/
[Reference](https://ai.inqline.com/machine-learning-for-marketing-customer-segmentation/)
## 🚀 Challenge

@ -100,11 +100,11 @@ By ensuring that the content aligns with projects, the process is made more enga
| 08 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build a logistic regression model | <ul><li>[Python](2-Regression/4-Logistic/README.md) </li><li>[R](2-Regression/4-Logistic/solution/R/lesson_4-R.ipynb)</li></ul> | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 09 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a web app to use your trained model | [Python](3-Web-App/1-Web-App/README.md) | Jen |
| 10 | Introduction to classification | [Classification](4-Classification/README.md) | Clean, prep, and visualize your data; introduction to classification | <ul><li> [Python](4-Classification/1-Introduction/README.md) </li><li>[R](4-Classification/1-Introduction/solution/R/lesson_10-R.ipynb) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 11 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Introduction to classifiers | <ul><li> [Python](4-Classification/2-Classifiers-1/README.md)</li><li>[R](4-Classification/2-Classifiers-1/solution/R/lesson_11-R.ipynb) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 12 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | More classifiers | <ul><li> [Python](4-Classification/3-Classifiers-2/README.md)</li><li>[R](4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 11 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Introduction to classifiers | <ul><li> [Python](4-Classification/2-Classifiers-1/README.md)</li><li>[R](4-Classification/2-Classifiers-1/solution/R/lesson_11.html) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 12 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | More classifiers | <ul><li> [Python](4-Classification/3-Classifiers-2/README.md)</li><li>[R](4-Classification/3-Classifiers-2/solution/R/lesson_12.html) | <ul><li>Jen and Cassie</li><li>Eric Wanjau</li></ul> |
| 13 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Build a recommender web app using your model | [Python](4-Classification/4-Applied/README.md) | Jen |
| 14 | Introduction to clustering | [Clustering](5-Clustering/README.md) | Clean, prep, and visualize your data; Introduction to clustering | <ul><li> [Python](5-Clustering/1-Visualize/README.md)</li><li>[R](5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb) | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 15 | Exploring Nigerian Musical Tastes 🎧 | [Clustering](5-Clustering/README.md) | Explore the K-Means clustering method | <ul><li> [Python](5-Clustering/2-K-Means/README.md)</li><li>[R](5-Clustering/2-K-Means/solution/R/lesson_15-R.ipynb) | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 14 | Introduction to clustering | [Clustering](5-Clustering/README.md) | Clean, prep, and visualize your data; Introduction to clustering | <ul><li> [Python](5-Clustering/1-Visualize/README.md)</li><li>[R](5-Clustering/1-Visualize/solution/R/lesson_14.html) | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 15 | Exploring Nigerian Musical Tastes 🎧 | [Clustering](5-Clustering/README.md) | Explore the K-Means clustering method | <ul><li> [Python](5-Clustering/2-K-Means/README.md)</li><li>[R](5-Clustering/2-K-Means/solution/R/lesson_15.html) | <ul><li>Jen</li><li>Eric Wanjau</li></ul> |
| 16 | Introduction to natural language processing ☕️ | [Natural language processing](6-NLP/README.md) | Learn the basics about NLP by building a simple bot | [Python](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
| 17 | Common NLP Tasks ☕️ | [Natural language processing](6-NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [Python](6-NLP/2-Tasks/README.md) | Stephen |
| 18 | Translation and sentiment analysis ♥️ | [Natural language processing](6-NLP/README.md) | Translation and sentiment analysis with Jane Austen | [Python](6-NLP/3-Translation-Sentiment/README.md) | Stephen |

Loading…
Cancel
Save