diff --git a/8-Reinforcement/1-QLearning/README.md b/8-Reinforcement/1-QLearning/README.md
index bfa07ffe..6301c46e 100644
--- a/8-Reinforcement/1-QLearning/README.md
+++ b/8-Reinforcement/1-QLearning/README.md
@@ -229,8 +229,7 @@ We are now ready to implement the learning algorithm. Before we do that, we also
We add a few `eps` to the original vector in order to avoid division by 0 in the initial case, when all components of the vector are identical.
Run them learning algorithm through 5000 experiments, also called **epochs**: (code block 8)
-
- ```python
+```python
for epoch in range(5000):
# Pick initial point
@@ -255,11 +254,11 @@ Run them learning algorithm through 5000 experiments, also called **epochs**: (c
ai = action_idx[a]
Q[x,y,ai] = (1 - alpha) * Q[x,y,ai] + alpha * (r + gamma * Q[x+dpos[0], y+dpos[1]].max())
n+=1
- ```
+```
- After executing this algorithm, the Q-Table should be updated with values that define the attractiveness of different actions at each step. We can try to visualize the Q-Table by plotting a vector at each cell that will point in the desired direction of movement. For simplicity, we draw a small circle instead of an arrow head.
+After executing this algorithm, the Q-Table should be updated with values that define the attractiveness of different actions at each step. We can try to visualize the Q-Table by plotting a vector at each cell that will point in the desired direction of movement. For simplicity, we draw a small circle instead of an arrow head.
-
+
## Checking the policy