few edits for RL

3 years ago · b266f1e858
parent 13d2a03710
commit b266f1e858
3 changed files with 17 additions and 14 deletions
--- a/8-Reinforcement/1-QLearning/README.md
+++ b/8-Reinforcement/1-QLearning/README.md
@ -1,4 +1,8 @@
 # Introduction to Reinforcement Learning and Q-Learning
+
+[![Intro to Reinforcement Learning](https://img.youtube.com/vi/lDq_en8RNOo/0.jpg)](https://www.youtube.com/watch?v=lDq_en8RNOo)
+
+> 🎥 Click the image above to hear Dmitry discuss Reinforcement Learning
 ## [Pre-lecture quiz](link-to-quiz-app)

 In this lesson, we will explore the world of **[Peter and the Wolf](https://en.wikipedia.org/wiki/Peter_and_the_Wolf)**, inspired by a musical fairy tale by a Russian composer, [Sergei Prokofiev](https://en.wikipedia.org/wiki/Sergei_Prokofiev). We will use **Reinforcement Learning** to let Peter explore his environment, collect tasty apples and avoid meeting the wolf.
--- a/8-Reinforcement/README.md
+++ b/8-Reinforcement/README.md
@ -1,16 +1,19 @@
 # Getting Started with Reinforcement Learning

-[![Intro to Reinforcement Learning](https://img.youtube.com/vi/lDq_en8RNOo/0.jpg)](https://www.youtube.com/watch?v=lDq_en8RNOo)
+[![Peter and the Wolf](https://img.youtube.com/vi/Fmi5zHg4QSM/0.jpg)](https://www.youtube.com/watch?v=Fmi5zHg4QSM)

+> 🎥 Click the image above to listen to Peter and the Wolf by Prokofiev
 ## Regional Topic: Peter and the Wolf (Russia)

 [Peter and the Wolf](https://en.wikipedia.org/wiki/Peter_and_the_Wolf) is a musical fairy tale written by a Russian composer [Sergei Prokofiev](https://en.wikipedia.org/wiki/Sergei_Prokofiev). It is a story about young pioneer Peter, who bravely goes out of his house to the forest clearing to chase the wolf. In this section, we will train machine learning algorithms that will help Peter:
-* to explore the surroinding area and build an optimal navigation map
-* to learn how to use a skateboard and balance on it, in order to move around faster.
+
+- **Explore** the surrounding area and build an optimal navigation map
+- **Learn** how to use a skateboard and balance on it, in order to move around faster.

 ## Introduction to Reinforcement Learning

 In previous sections, you have seen two example of machine learning problems:
+
 * **Supervised**, where we had some datasets that show sample solutions to the problem we want to solve. [Classification][Classification] and [regression][Regression] are supervised learning tasks.
 * **Unsupervised**, in which we do not have training data. The main example of unsupervised learning is [clustering][Clustering].

@ -21,21 +24,17 @@ In this section, we will introduce you to a new type of learning problems, which

 Suppose, you want to teach computer to play a game, such as chess, or [Super Mario](https://en.wikipedia.org/wiki/Super_Mario). For computer to play a game, we need it to predict which move to make in each of the game states. While this may seem like a classification problem, it is not - because we do not have a dataset with states and corresponding actions. While we may have some data like that (existing chess matches, or recording of players playing Super Mario), it is likely not to cover sufficiently large number of possible states.

-Instead of looking for existing game data, **reinforcement learning** (RL) is based on the idea of *making computer play* many times, observing the result. Thus, to apply reinforcement learning, we need two things:
+Instead of looking for existing game data, **reinforcement learning** (RL) is based on the idea of *making the computer play* many times, observing the result. Thus, to apply reinforcement learning, we need two things:
 1. **An environment** and **a simulator**, which would allow us to play a game many times. This simulator would define all game rules, possible states and actions.
 2. **A reward function**, which would tell us how good we did during each move or game.

-The main difference between supervised learning is that in RL we typically do not know whether we win or lose until we finish the game. Thus, we cannot say whether a certain move alone is good or now - we only receive reward at the end of the game. And our goal is to design such algorightms that will allow us to train a model under such uncertain conditions. We will learn about one RL algorithm called **Q-learning**.
+The main difference between supervised learning is that in RL we typically do not know whether we win or lose until we finish the game. Thus, we cannot say whether a certain move alone is good or now - we only receive reward at the end of the game. And our goal is to design such algorithms that will allow us to train a model under such uncertain conditions. We will learn about one RL algorithm called **Q-learning**.

 ## Lessons

-1. [Introduction to Reinforcement Learning and Q-Learning](1-qlearning/README.md)
-2. [Using gym simulation environment](2-gym/README.md)
+1. [Introduction to Reinforcement Learning and Q-Learning](1-QLearning/README.md)
+2. [Using gym simulation environment](2-Gym/README.md)

 ## Credits

-"Introduction to" was written with ♥️ by [Dmitry Soshnikov](http://soshnikov.com)
-
-[Classification]: ../4-Classification/README.md
-[Regression]: ../2-Regression/README.md
-[Clustering]: ../5-Clustering/README.md
+"Introduction to Reinforcement Learning" was written with ♥️ by [Dmitry Soshnikov](http://soshnikov.com)
--- a/README.md
+++ b/README.md
@ -92,8 +92,8 @@ By ensuring that the content aligns with projects, the process is made more enga
 |      19       |                Romantic Hotels of Europe ♥️                 |   [Natural Language Processing](6-NLP/README.md)    | Sentiment analysis, continued                                                                                                   |                      [lesson]()                       |   Stephen   |
 |      20       |          Introduction to Time Series Forecasting           |        [Time Series](7-TimeSeries/README.md)        | Introduction to Time Series Forecasting                                                                                         |    [lesson](7-TimeSeries/1-Introduction/README.md)    |  Francesca  |
 |      21       | ⚡️ World Power Usage ⚡️ Time Series Forecasting with ARIMA ⚡️ |        [Time Series](7-TimeSeries/README.md)        | Time Series Forecasting with ARIMA                                                                                              |       [lesson](7-TimeSeries/2-ARIMA/README.md)        |  Francesca  |
-|      22       |           Introduction to Reinforcement Learning           | [Reinforcement Learning](8-Reinforcement/README.md) | tbd                                                                                                                             |                      [lesson]()                       |   Dmitry    |
-|      23       |                Help Peter avoid the Wolf! 🐺                | [Reinforcement Learning](8-Reinforcement/README.md) | tbd                                                                                                                             |                      [lesson]()                       |   Dmitry    |
+|      22       |           Introduction to Reinforcement Learning           | [Reinforcement Learning](8-Reinforcement/README.md) | Introduction to Reinforcement Learning with Q-Learning                                                                          |    [lesson](8-Reinforcement/1-QLearning/README.md)    |   Dmitry    |
+|      23       |                Help Peter avoid the Wolf! 🐺                | [Reinforcement Learning](8-Reinforcement/README.md) | Reinforcement Learning Gym                                                                                                      |       [lesson](8-Reinforcement/2-Gym/README.md)       |   Dmitry    |
 |      24       |          Real-World ML Scenarios and Applications          |      [ML in the Wild](9-Real-World/README.md)       | Interesting and Revealing real-world applications of classical ML                                                               |    [lesson](9-Real-World/1-Applications/README.md)    |    Team     |
 ## Offline access