This repository contains a Jupyter Notebook with an implementation of a Q-Learning
agent, which learns to solve the n-Chain OpenAI Gym
environment
This notebook is inspired by the following notebook: Deep Reinforcement Learning Course Notebook
The notebook contains a Q-Learning
algorithm implementation and a training loop to solve the n-Chain OpenAI Gym environment. The Q-Learning
algorithm is an off-policy temporal-difference control algorithm [1]:
Image taken from Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second edition, 2014/2015, page 158
In this notebook we let different q-learning agents play the N-Chain evironment and see how they perform in the game. The following agents are implemented:
- 🤓 Smart Agent 1: the agent explores and takes future rewards into account
- 🤑 Greedy Agent 2: the agent cares only about immediate rewards (small gamma)
- 😳 Shy Agent 3: the agent doesn't explore the environment (small epsilon)
The n-Chain environment is taken from the OpenAI Gym
module. Documentation:
The image below shows an example of a 5-Chain (n = 5) environment with 5 states. a
stands for action and r
for the reward (Image Source).
This environment contains a chain with n positions, and every chain position corresponds to a possible state the agent can be in:
state | description |
---|---|
n (default n=5) | n-th postion on the chain |
The agent can move along the chain using two actions for which the agent will get a different rewards:
action | reward | description |
---|---|---|
0 | get no reward | move forward along the chain (state = n+1) |
1 | get a small reward of 2 | jump back to state 0 |
The end of the chain presents a large reward of 10, and while standing at the end of the chain and still moving forward (action 0), the large reward can be gained repeatedly.
- OpenAI Gym: Gym is a toolkit for developing and comparing reinforcement learning algorithms from OpenAI
- OpenAI Baselines: OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms
- Spining Up AI: This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning
- A Long Peek into Reinforcement Learning: Great blog post from Lilian Weng, where she is briefly going over the field of Reinforcement Learning (RL), from fundamental concepts to classic algorithms
- Policy Gradient Algorithms: Another great blog post from Lilian Weng, where she writes about policy gradient algorithms