You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
ML-For-Beginners/translations/en/8-Reinforcement/2-Gym/assignment.md

3.3 KiB

Train Mountain Car

OpenAI Gym is designed so that all environments share the same API—i.e., the same methods reset, step, and render, as well as the same abstractions for action space and observation space. This makes it possible to adapt the same reinforcement learning algorithms to different environments with minimal code changes.

A Mountain Car Environment

The Mountain Car environment involves a car stuck in a valley:

The goal is to get out of the valley and reach the flag by performing one of the following actions at each step:

Value Meaning
0 Accelerate to the left
1 Do not accelerate
2 Accelerate to the right

The main challenge of this problem is that the car's engine is not powerful enough to climb the mountain in a single attempt. Therefore, the only way to succeed is to drive back and forth to build up momentum.

The observation space consists of just two values:

Num Observation Min Max
0 Car Position -1.2 0.6
1 Car Velocity -0.07 0.07

The reward system for the mountain car is somewhat tricky:

  • A reward of 0 is given if the agent reaches the flag (position = 0.5) at the top of the mountain.
  • A reward of -1 is given if the agent's position is less than 0.5.

The episode ends if the car's position exceeds 0.5 or if the episode length exceeds 200 steps.

Instructions

Adapt our reinforcement learning algorithm to solve the mountain car problem. Start with the existing notebook.ipynb code, substitute the new environment, modify the state discretization functions, and try to train the existing algorithm with minimal code changes. Optimize the results by adjusting hyperparameters.

Note: Adjusting hyperparameters will likely be necessary to make the algorithm converge.

Rubric

Criteria Exemplary Adequate Needs Improvement
The Q-Learning algorithm is successfully adapted from the CartPole example with minimal code modifications and is able to solve the problem of capturing the flag in under 200 steps. A new Q-Learning algorithm is adopted from the Internet but is well-documented; or the existing algorithm is adapted but does not achieve the desired results. The student was unable to successfully adopt any algorithm but made substantial progress toward a solution (e.g., implemented state discretization, Q-Table data structure, etc.).

Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we strive for accuracy, please note that automated translations may contain errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is recommended. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.