@ -1,55 +0,0 @@
|
||||
# [Lesson Topic]
|
||||
|
||||
Add a sketchnote if possible/appropriate
|
||||
|
||||
![Embed a video here if available](video-url)
|
||||
|
||||
## [Pre-lecture quiz](link-to-quiz-app)
|
||||
|
||||
Describe what we will learn
|
||||
|
||||
### Introduction
|
||||
|
||||
Describe what will be covered
|
||||
|
||||
> Notes
|
||||
|
||||
### Prerequisite
|
||||
|
||||
What steps should have been covered before this lesson?
|
||||
|
||||
### Preparation
|
||||
|
||||
Preparatory steps to start this lesson
|
||||
|
||||
---
|
||||
|
||||
[Step through content in blocks]
|
||||
|
||||
## [Topic 1]
|
||||
|
||||
### Task:
|
||||
|
||||
Work together to progressively enhance your codebase to build the project with shared code:
|
||||
|
||||
```html
|
||||
code blocks
|
||||
```
|
||||
|
||||
✅ Knowledge Check - use this moment to stretch students' knowledge with open questions
|
||||
|
||||
## [Topic 2]
|
||||
|
||||
## [Topic 3]
|
||||
|
||||
## 🚀Challenge
|
||||
|
||||
Add a challenge for students to work on collaboratively in class to enhance the project
|
||||
|
||||
Optional: add a screenshot of the completed lesson's UI if appropriate
|
||||
|
||||
## [Post-lecture quiz](link-to-quiz-app)
|
||||
|
||||
## Review & Self Study
|
||||
|
||||
## Assignment [Assignment Name](assignment.md)
|
@ -0,0 +1,25 @@
|
||||
# More Realistic Peter and the Wolf World
|
||||
|
||||
In our situation, Peter was able to move around almost without getting tired or hungry. In more realistic world, we has to sit down and rest from time to time, and also to feed himself. Let's make our world more realistic, by implementing the following rules:
|
||||
|
||||
1. By moving from one place to another, Peter loses **energy** and gains some **fatigue**.
|
||||
2. Peter can gain more energy by eating apples.
|
||||
3. Peter can get rid of fatigue by resting under the tree or on the grass (i.e. walking into a board location with a tree or grass - green field)
|
||||
4. Peter needs to find and kill the wolf
|
||||
5. In order to kill the wolf, Peter needs to have certain levels of energy and fatigue, otherwise he loses the battle.
|
||||
## Instructions
|
||||
|
||||
Use original [MazeLearner.ipynb](MazeLearner.ipynb) notebook as a starting point for your solution.
|
||||
|
||||
Modify the reward function above according to the rules of the game, run the reinforcement learning algorithm to learn the best strategy for winning the game, and compare the results of random walk with your algorithm in terms of number of games won and lost.
|
||||
|
||||
> **Note**: In your new world, the state is more complex, and in addition to human position also includes fatigue and energy levels. You may chose to represent the state as a tuple (Board,energy,fatigue), or define a class for the state (you may also want to derive it from `Board`), or even modify the original `Board` class inside [rlboard.py](rlboard.py).
|
||||
|
||||
In your solution, please keep the code responsible for random walk strategy, and compare the results of your algorithm with random walk at the end.
|
||||
|
||||
> **Note**: You may need to adjust hyperparameters to make it work, especially the number of epochs. Because the success of the game (fighting the wolf) is a rare event, you can expect much longer training time.
|
||||
## Rubric
|
||||
|
||||
| Criteria | Exemplary | Adequate | Needs Improvement |
|
||||
| -------- | --------- | -------- | ----------------- |
|
||||
| | A notebook is presented with the definition of new world rules, Q-Learning algorithm and some textual explanations. Q-Learning is able to significantly improve the results comparing to random walk. | Notebook is presented, Q-Learning is implemented and improves results comparing to random walk, but not significantly; or notebook is poorly documented and code is not well-structured | Some attempt to re-define the rules of the world are made, but Q-Learning algorithm does not work, or reward function is not fully defined |
|
After Width: | Height: | Size: 121 KiB |
After Width: | Height: | Size: 1.1 KiB |
After Width: | Height: | Size: 3.4 KiB |
After Width: | Height: | Size: 4.0 KiB |
After Width: | Height: | Size: 1.8 KiB |
After Width: | Height: | Size: 5.0 KiB |
After Width: | Height: | Size: 5.0 KiB |
After Width: | Height: | Size: 4.4 KiB |
After Width: | Height: | Size: 310 KiB |
After Width: | Height: | Size: 1.4 MiB |
After Width: | Height: | Size: 12 KiB |
@ -0,0 +1,194 @@
|
||||
# Maze simulation environment for Reinforcement Learning tutorial
|
||||
# by Dmitry Soshnikov
|
||||
# http://soshnikov.com
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
import cv2
|
||||
import random
|
||||
import math
|
||||
|
||||
def clip(min,max,x):
|
||||
if x<min:
|
||||
return min
|
||||
if x>max:
|
||||
return max
|
||||
return x
|
||||
|
||||
def imload(fname,size):
|
||||
img = cv2.imread(fname)
|
||||
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
|
||||
img = cv2.resize(img,(size,size),interpolation=cv2.INTER_LANCZOS4)
|
||||
img = img / np.max(img)
|
||||
return img
|
||||
|
||||
def draw_line(dx,dy,size=50):
|
||||
p=np.ones((size-2,size-2,3))
|
||||
if dx==0:
|
||||
dx=0.001
|
||||
m = (size-2)//2
|
||||
l = math.sqrt(dx*dx+dy*dy)*(size-4)/2
|
||||
a = math.atan(dy/dx)
|
||||
cv2.line(p,(int(m-l*math.cos(a)),int(m-l*math.sin(a))),(int(m+l*math.cos(a)),int(m+l*math.sin(a))),(0,0,0),1)
|
||||
s = -1 if dx<0 else 1
|
||||
cv2.circle(p,(int(m+s*l*math.cos(a)),int(m+s*l*math.sin(a))),3,0)
|
||||
return p
|
||||
|
||||
def probs(v):
|
||||
v = v-v.min()
|
||||
if (v.sum()>0):
|
||||
v = v/v.sum()
|
||||
return v
|
||||
|
||||
class Board:
|
||||
class Cell:
|
||||
empty = 0
|
||||
water = 1
|
||||
wolf = 2
|
||||
tree = 3
|
||||
apple = 4
|
||||
def __init__(self,width,height,size=50):
|
||||
self.width = width
|
||||
self.height = height
|
||||
self.size = size+2
|
||||
self.matrix = np.zeros((width,height))
|
||||
self.grid_color = (0.6,0.6,0.6)
|
||||
self.background_color = (1.0,1.0,1.0)
|
||||
self.grid_thickness = 1
|
||||
self.grid_line_type = cv2.LINE_AA
|
||||
self.pics = {
|
||||
"wolf" : imload('images/wolf.png',size-4),
|
||||
"apple" : imload('images/apple.png',size-4),
|
||||
"human" : imload('images/human.png',size-4)
|
||||
}
|
||||
self.human = (0,0)
|
||||
self.frame_no = 0
|
||||
|
||||
def randomize(self,water_size=5, num_water=3, num_wolves=1, num_trees=5, num_apples=3,seed=None):
|
||||
if seed:
|
||||
random.seed(seed)
|
||||
for _ in range(num_water):
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
for _ in range(water_size):
|
||||
self.matrix[x,y] = Board.Cell.water
|
||||
x = clip(0,self.width-1,x+random.randint(-1,1))
|
||||
y = clip(0,self.height-1,y+random.randint(-1,1))
|
||||
for _ in range(num_trees):
|
||||
while True:
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
if self.matrix[x,y]==Board.Cell.empty:
|
||||
self.matrix[x,y] = Board.Cell.tree # tree
|
||||
break
|
||||
for _ in range(num_wolves):
|
||||
while True:
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
if self.matrix[x,y]==Board.Cell.empty:
|
||||
self.matrix[x,y] = Board.Cell.wolf # wolf
|
||||
break
|
||||
for _ in range(num_apples):
|
||||
while True:
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
if self.matrix[x,y]==Board.Cell.empty:
|
||||
self.matrix[x,y] = Board.Cell.apple
|
||||
break
|
||||
|
||||
def at(self,pos=None):
|
||||
if pos:
|
||||
return self.matrix[pos[0],pos[1]]
|
||||
else:
|
||||
return self.matrix[self.human[0],self.human[1]]
|
||||
|
||||
def is_valid(self,pos):
|
||||
return pos[0]>=0 and pos[0]<self.width and pos[1]>=0 and pos[1] < self.height
|
||||
|
||||
def move_pos(self, pos, dpos):
|
||||
return (pos[0] + dpos[0], pos[1] + dpos[1])
|
||||
|
||||
def move(self,dpos):
|
||||
new_pos = self.move_pos(self.human,dpos)
|
||||
self.human = new_pos
|
||||
|
||||
def random_pos(self):
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
return (x,y)
|
||||
|
||||
def random_start(self):
|
||||
while True:
|
||||
pos = self.random_pos()
|
||||
if self.at(pos) == Board.Cell.empty:
|
||||
self.human = pos
|
||||
break
|
||||
|
||||
|
||||
def image(self,Q=None):
|
||||
img = np.zeros((self.height*self.size+1,self.width*self.size+1,3))
|
||||
img[:,:,:] = self.background_color
|
||||
# Draw water
|
||||
for x in range(self.width):
|
||||
for y in range(self.height):
|
||||
if (x,y) == self.human:
|
||||
ov = self.pics['human']
|
||||
img[self.size*y+2:self.size*y+ov.shape[0]+2,self.size*x+2:self.size*x+2+ov.shape[1],:] = np.minimum(ov,1.0)
|
||||
continue
|
||||
if self.matrix[x,y] == Board.Cell.water:
|
||||
img[self.size*y:self.size*(y+1),self.size*x:self.size*(x+1),:] = (0,0,1.0)
|
||||
if self.matrix[x,y] == Board.Cell.wolf:
|
||||
ov = self.pics['wolf']
|
||||
img[self.size*y+2:self.size*y+ov.shape[0]+2,self.size*x+2:self.size*x+2+ov.shape[1],:] = np.minimum(ov,1.0)
|
||||
if self.matrix[x,y] == Board.Cell.apple: # apple
|
||||
ov = self.pics['apple']
|
||||
img[self.size*y+2:self.size*y+ov.shape[0]+2,self.size*x+2:self.size*x+2+ov.shape[1],:] = np.minimum(ov,1.0)
|
||||
if self.matrix[x,y] == Board.Cell.tree: # tree
|
||||
img[self.size*y:self.size*(y+1),self.size*x:self.size*(x+1),:] = (0,1.0,0)
|
||||
if self.matrix[x,y] == Board.Cell.empty and Q is not None:
|
||||
p = probs(Q[x,y])
|
||||
dx,dy = 0,0
|
||||
for i,(ddx,ddy) in enumerate([(-1,0),(1,0),(0,-1),(0,1)]):
|
||||
dx += ddx*p[i]
|
||||
dy += ddy*p[i]
|
||||
l = draw_line(dx,dy,self.size)
|
||||
img[self.size*y+2:self.size*y+l.shape[0]+2,self.size*x+2:self.size*x+2+l.shape[1],:] = l
|
||||
|
||||
# Draw grid
|
||||
for i in range(self.height+1):
|
||||
img[:,i*self.size] = 0.3
|
||||
#cv2.line(img,(0,i*self.size),(self.width*self.size,i*self.size), self.grid_color, self.grid_thickness,lineType=self.grid_line_type)
|
||||
for j in range(self.width+1):
|
||||
img[j*self.size,:] = 0.3
|
||||
#cv2.line(img,(j*self.size,0),(j*self.size,self.height*self.size), self.grid_color, self.grid_thickness,lineType=self.grid_line_type)
|
||||
return img
|
||||
|
||||
def plot(self,Q=None):
|
||||
plt.figure(figsize=(11,6))
|
||||
plt.imshow(self.image(Q),interpolation='hanning')
|
||||
|
||||
def saveimage(self,filename,Q=None):
|
||||
cv2.imwrite(filename,255*self.image(Q)[...,::-1])
|
||||
|
||||
def walk(self,policy,save_to=None,start=None):
|
||||
n = 0
|
||||
if start:
|
||||
self.human = start
|
||||
else:
|
||||
self.random_start()
|
||||
|
||||
while True:
|
||||
if save_to:
|
||||
self.saveimage(save_to.format(self.frame_no))
|
||||
self.frame_no+=1
|
||||
if self.at() == Board.Cell.apple:
|
||||
return n # success!
|
||||
if self.at() in [Board.Cell.wolf, Board.Cell.water]:
|
||||
return -1 # eaten by wolf or drowned
|
||||
while True:
|
||||
a = policy(self)
|
||||
new_pos = self.move_pos(self.human,a)
|
||||
if self.is_valid(new_pos) and self.at(new_pos)!=Board.Cell.water:
|
||||
self.move(a) # do the actual move
|
||||
break
|
||||
n+=1
|
@ -0,0 +1,195 @@
|
||||
# Maze simulation environment for Reinforcement Learning tutorial
|
||||
# by Dmitry Soshnikov
|
||||
# http://soshnikov.com
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
import cv2
|
||||
import random
|
||||
import math
|
||||
|
||||
def clip(min,max,x):
|
||||
if x<min:
|
||||
return min
|
||||
if x>max:
|
||||
return max
|
||||
return x
|
||||
|
||||
def imload(fname,size):
|
||||
img = cv2.imread(fname)
|
||||
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
|
||||
img = cv2.resize(img,(size,size),interpolation=cv2.INTER_LANCZOS4)
|
||||
img = img / np.max(img)
|
||||
return img
|
||||
|
||||
def draw_line(dx,dy,size=50):
|
||||
p=np.ones((size-2,size-2,3))
|
||||
if dx==0:
|
||||
dx=0.001
|
||||
m = (size-2)//2
|
||||
l = math.sqrt(dx*dx+dy*dy)*(size-4)/2
|
||||
a = math.atan(dy/dx)
|
||||
cv2.line(p,(int(m-l*math.cos(a)),int(m-l*math.sin(a))),(int(m+l*math.cos(a)),int(m+l*math.sin(a))),(0,0,0),1)
|
||||
s = -1 if dx<0 else 1
|
||||
cv2.circle(p,(int(m+s*l*math.cos(a)),int(m+s*l*math.sin(a))),3,0)
|
||||
return p
|
||||
|
||||
def probs(v):
|
||||
v = v-v.min()
|
||||
if (v.sum()>0):
|
||||
v = v/v.sum()
|
||||
return v
|
||||
|
||||
class Board:
|
||||
class Cell:
|
||||
empty = 0
|
||||
water = 1
|
||||
wolf = 2
|
||||
tree = 3
|
||||
apple = 4
|
||||
def __init__(self,width,height,size=50):
|
||||
self.width = width
|
||||
self.height = height
|
||||
self.size = size+2
|
||||
self.matrix = np.zeros((width,height))
|
||||
self.grid_color = (0.6,0.6,0.6)
|
||||
self.background_color = (1.0,1.0,1.0)
|
||||
self.grid_thickness = 1
|
||||
self.grid_line_type = cv2.LINE_AA
|
||||
self.pics = {
|
||||
"wolf" : imload('../images/wolf.png',size-4),
|
||||
"apple" : imload('../images/apple.png',size-4),
|
||||
"human" : imload('../images/human.png',size-4)
|
||||
}
|
||||
self.human = (0,0)
|
||||
self.frame_no = 0
|
||||
|
||||
def randomize(self,water_size=5, num_water=3, num_wolves=1, num_trees=5, num_apples=3,seed=None):
|
||||
if seed:
|
||||
random.seed(seed)
|
||||
for _ in range(num_water):
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
for _ in range(water_size):
|
||||
self.matrix[x,y] = Board.Cell.water
|
||||
x = clip(0,self.width-1,x+random.randint(-1,1))
|
||||
y = clip(0,self.height-1,y+random.randint(-1,1))
|
||||
for _ in range(num_trees):
|
||||
while True:
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
if self.matrix[x,y]==Board.Cell.empty:
|
||||
self.matrix[x,y] = Board.Cell.tree # tree
|
||||
break
|
||||
for _ in range(num_wolves):
|
||||
while True:
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
if self.matrix[x,y]==Board.Cell.empty:
|
||||
self.matrix[x,y] = Board.Cell.wolf # wolf
|
||||
break
|
||||
for _ in range(num_apples):
|
||||
while True:
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
if self.matrix[x,y]==Board.Cell.empty:
|
||||
self.matrix[x,y] = Board.Cell.apple
|
||||
break
|
||||
|
||||
def at(self,pos=None):
|
||||
if pos:
|
||||
return self.matrix[pos[0],pos[1]]
|
||||
else:
|
||||
return self.matrix[self.human[0],self.human[1]]
|
||||
|
||||
def is_valid(self,pos):
|
||||
return pos[0]>=0 and pos[0]<self.width and pos[1]>=0 and pos[1] < self.height
|
||||
|
||||
def move_pos(self, pos, dpos):
|
||||
return (pos[0] + dpos[0], pos[1] + dpos[1])
|
||||
|
||||
def move(self,dpos):
|
||||
new_pos = self.move_pos(self.human,dpos)
|
||||
if self.is_valid(new_pos):
|
||||
self.human = new_pos
|
||||
|
||||
def random_pos(self):
|
||||
x = random.randint(0,self.width-1)
|
||||
y = random.randint(0,self.height-1)
|
||||
return (x,y)
|
||||
|
||||
def random_start(self):
|
||||
while True:
|
||||
pos = self.random_pos()
|
||||
if self.at(pos) == Board.Cell.empty:
|
||||
self.human = pos
|
||||
break
|
||||
|
||||
|
||||
def image(self,Q=None):
|
||||
img = np.zeros((self.height*self.size+1,self.width*self.size+1,3))
|
||||
img[:,:,:] = self.background_color
|
||||
# Draw water
|
||||
for x in range(self.width):
|
||||
for y in range(self.height):
|
||||
if (x,y) == self.human:
|
||||
ov = self.pics['human']
|
||||
img[self.size*y+2:self.size*y+ov.shape[0]+2,self.size*x+2:self.size*x+2+ov.shape[1],:] = np.minimum(ov,1.0)
|
||||
continue
|
||||
if self.matrix[x,y] == Board.Cell.water:
|
||||
img[self.size*y:self.size*(y+1),self.size*x:self.size*(x+1),:] = (0,0,1.0)
|
||||
if self.matrix[x,y] == Board.Cell.wolf:
|
||||
ov = self.pics['wolf']
|
||||
img[self.size*y+2:self.size*y+ov.shape[0]+2,self.size*x+2:self.size*x+2+ov.shape[1],:] = np.minimum(ov,1.0)
|
||||
if self.matrix[x,y] == Board.Cell.apple: # apple
|
||||
ov = self.pics['apple']
|
||||
img[self.size*y+2:self.size*y+ov.shape[0]+2,self.size*x+2:self.size*x+2+ov.shape[1],:] = np.minimum(ov,1.0)
|
||||
if self.matrix[x,y] == Board.Cell.tree: # tree
|
||||
img[self.size*y:self.size*(y+1),self.size*x:self.size*(x+1),:] = (0,1.0,0)
|
||||
if self.matrix[x,y] == Board.Cell.empty and Q is not None:
|
||||
p = probs(Q[x,y])
|
||||
dx,dy = 0,0
|
||||
for i,(ddx,ddy) in enumerate([(-1,0),(1,0),(0,-1),(0,1)]):
|
||||
dx += ddx*p[i]
|
||||
dy += ddy*p[i]
|
||||
l = draw_line(dx,dy,self.size)
|
||||
img[self.size*y+2:self.size*y+l.shape[0]+2,self.size*x+2:self.size*x+2+l.shape[1],:] = l
|
||||
|
||||
# Draw grid
|
||||
for i in range(self.height+1):
|
||||
img[:,i*self.size] = 0.3
|
||||
#cv2.line(img,(0,i*self.size),(self.width*self.size,i*self.size), self.grid_color, self.grid_thickness,lineType=self.grid_line_type)
|
||||
for j in range(self.width+1):
|
||||
img[j*self.size,:] = 0.3
|
||||
#cv2.line(img,(j*self.size,0),(j*self.size,self.height*self.size), self.grid_color, self.grid_thickness,lineType=self.grid_line_type)
|
||||
return img
|
||||
|
||||
def plot(self,Q=None):
|
||||
plt.figure(figsize=(11,6))
|
||||
plt.imshow(self.image(Q),interpolation='hanning')
|
||||
|
||||
def saveimage(self,filename,Q=None):
|
||||
cv2.imwrite(filename,255*self.image(Q)[...,::-1])
|
||||
|
||||
def walk(self,policy,save_to=None,start=None):
|
||||
n = 0
|
||||
if start:
|
||||
self.human = start
|
||||
else:
|
||||
self.random_start()
|
||||
|
||||
while True:
|
||||
if save_to:
|
||||
self.saveimage(save_to.format(self.frame_no))
|
||||
self.frame_no+=1
|
||||
if self.at() == Board.Cell.apple:
|
||||
return n # success!
|
||||
if self.at() in [Board.Cell.wolf, Board.Cell.water]:
|
||||
return -1 # eaten by wolf or drowned
|
||||
while True:
|
||||
a = policy(self)
|
||||
new_pos = self.move_pos(self.human,a)
|
||||
if self.is_valid(new_pos) and self.at(new_pos)!=Board.Cell.water:
|
||||
self.move(a) # do the actual move
|
||||
break
|
||||
n+=1
|
@ -1,9 +0,0 @@
|
||||
# [Assignment Name]
|
||||
|
||||
## Instructions
|
||||
|
||||
## Rubric
|
||||
|
||||
| Criteria | Exemplary | Adequate | Needs Improvement |
|
||||
| -------- | --------- | -------- | ----------------- |
|
||||
| | | | |
|
@ -1,12 +1,41 @@
|
||||
# Getting Started with
|
||||
# Getting Started with Reinforcement Learning
|
||||
|
||||
In this section of the curriculum, you will be introduced to ...
|
||||
[![Intro to Reinforcement Learning](https://img.youtube.com/vi/lDq_en8RNOo/0.jpg)](https://www.youtube.com/watch?v=lDq_en8RNOo)
|
||||
|
||||
## Lessons
|
||||
## Regional Topic: Peter and the Wolf (Russia)
|
||||
|
||||
[Peter and the Wolf](https://en.wikipedia.org/wiki/Peter_and_the_Wolf) is a musical fairy tale written by a Russian composer [Sergei Prokofiev](https://en.wikipedia.org/wiki/Sergei_Prokofiev). It is a story about young pioneer Peter, who bravely goes out of his house to the forest clearing to chase the wolf. In this section, we will train machine learning algorithms that will help Peter:
|
||||
* to explore the surroinding area and build an optimal navigation map
|
||||
* to learn how to use a skateboard and balance on it, in order to move around faster.
|
||||
|
||||
## Introduction to Reinforcement Learning
|
||||
|
||||
In previous sections, you have seen two example of machine learning problems:
|
||||
* **Supervised**, where we had some datasets that show sample solutions to the problem we want to solve. [Classification][Classification] and [regression][Regression] are supervised learning tasks.
|
||||
* **Unsupervised**, in which we do not have training data. The main example of unsupervised learning is [clustering][Clustering].
|
||||
|
||||
In this section, we will introduce you to a new type of learning problems, which do not require labeled training data. There are a several types of such problems:
|
||||
|
||||
* **[Semi-supervised learning](https://en.wikipedia.org/wiki/Semi-supervised_learning)**, where we have a lot of unlabeled data that can be used to pre-train the model.
|
||||
* **[Reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning)**, in which the agent learns how to behave by performing a lot of experiments in some simulated environment.
|
||||
|
||||
1. [Introduction to](1-intro-to/README.md)
|
||||
Suppose, you want to teach computer to play a game, such as chess, or [Super Mario](https://en.wikipedia.org/wiki/Super_Mario). For computer to play a game, we need it to predict which move to make in each of the game states. While this may seem like a classification problem, it is not - because we do not have a dataset with states and corresponding actions. While we may have some data like that (existing chess matches, or recording of players playing Super Mario), it is likely not to cover sufficiently large number of possible states.
|
||||
|
||||
Instead of looking for existing game data, **reinforcement learning** (RL) is based on the idea of *making computer play* many times, observing the result. Thus, to apply reinforcement learning, we need two things:
|
||||
1. **An environment** and **a simulator**, which would allow us to play a game many times. This simulator would define all game rules, possible states and actions.
|
||||
2. **A reward function**, which would tell us how good we did during each move or game.
|
||||
|
||||
The main difference between supervised learning is that in RL we typically do not know whether we win or lose until we finish the game. Thus, we cannot say whether a certain move alone is good or now - we only receive reward at the end of the game. And our goal is to design such algorightms that will allow us to train a model under such uncertain conditions. We will learn about one RL algorithm called **Q-learning**.
|
||||
|
||||
## Lessons
|
||||
|
||||
1. [Introduction to Reinforcement Learning and Q-Learning](1-qlearning/README.md)
|
||||
2. [Using gym simulation environment](2-gym/README.md)
|
||||
|
||||
## Credits
|
||||
|
||||
"Introduction to" was written with ♥️ by [Name](Twitter)
|
||||
"Introduction to" was written with ♥️ by [Dmitry Soshnikov](http://soshnikov.com)
|
||||
|
||||
[Classification]: ../4-Classification/README.md
|
||||
[Regression]: ../2-Regression/README.md
|
||||
[Clustering]: ../5-Clustering/README.md
|
||||
|