diff --git a/8-Reinforcement/1-QLearning/README.md b/8-Reinforcement/1-QLearning/README.md index bfa07ffe..6301c46e 100644 --- a/8-Reinforcement/1-QLearning/README.md +++ b/8-Reinforcement/1-QLearning/README.md @@ -229,8 +229,7 @@ We are now ready to implement the learning algorithm. Before we do that, we also We add a few `eps` to the original vector in order to avoid division by 0 in the initial case, when all components of the vector are identical. Run them learning algorithm through 5000 experiments, also called **epochs**: (code block 8) - - ```python +```python for epoch in range(5000): # Pick initial point @@ -255,11 +254,11 @@ Run them learning algorithm through 5000 experiments, also called **epochs**: (c ai = action_idx[a] Q[x,y,ai] = (1 - alpha) * Q[x,y,ai] + alpha * (r + gamma * Q[x+dpos[0], y+dpos[1]].max()) n+=1 - ``` +``` - After executing this algorithm, the Q-Table should be updated with values that define the attractiveness of different actions at each step. We can try to visualize the Q-Table by plotting a vector at each cell that will point in the desired direction of movement. For simplicity, we draw a small circle instead of an arrow head. +After executing this algorithm, the Q-Table should be updated with values that define the attractiveness of different actions at each step. We can try to visualize the Q-Table by plotting a vector at each cell that will point in the desired direction of movement. For simplicity, we draw a small circle instead of an arrow head. - + ## Checking the policy