Skip to content

ivanbelenky/RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning

License: MIT Python 3.8

Installation

setup.py

$ python setup.py install

Overview

This repository contains code that implements algorithms and models from Sutton's book on reinforcement learning. The book, titled "Reinforcement Learning: An Introduction," is a classic text on the subject and provides a comprehensive introduction to the field.

The code in this repository is organized into several modules, each of which covers differents topics.

Methods

  • Multi Armed Bandits
    • Epsilon Greedy
    • Optimistic Initial Values
    • Gradient
    • α (non stationary)
  • Model Based
    • Policy Evaluation
    • Policy Iteration
    • Value Iteration
  • Monte Carlo estimation and control
    • First-visit α-MC
    • Every-visit α-MC
    • MC with Exploring Starts
    • Off-policy MC, ordinary and weighted importance sampling
  • Temporal Difference
    • TD(n) estimation
    • n-step SARSA
    • n-step Q-learning
    • n-step Expected SARSA
    • double Q learning
    • n-step Tree Backup
  • Planning
    • Dyna-Q/Dyna-Q+
    • Prioritized Sweeping
    • Trajectory Sampling
    • MCTS
  • On-policy Prediction
    • Gradient MC
    • $n$-step semi-gradient TD
    • ANN
    • Least-Squares TD
    • Kernel-based
  • On-policy Control
    • Episodic semi-gradient
    • Semi-gradient n-step Sarsa
    • Differential Semi-gradient n-step Sarsa
  • Elegibility Traces
    • TD($\lambda$)
    • True Online
    • Sarsa($\lambda$)
    • True Online Sarsa($\lambda$)
  • Policy Gradient
    • REINFORCE: Monte Carlo Policy Gradient w/wo Baseline
    • Actor-Critic (episodic) w/wo eligibility traces
    • Actor-Critic (continuing) with eligibility traces

All model free solvers will work just by defining states actions and a trasition function. Transitions are defined as a function that takes a state and an action and returns a tuple of the next state and the reward. The transition function also returns a boolean indicating whether the episode has terminated.

states: Sequence[Any]
actions: Sequence[Any]
transtion: Callable[[Any, Any], Tuple[Tuple[Any, float], bool]]

Examples

Single State Infinite Variance Example 5.5

from mypyrl import off_policy_mc, ModelFreePolicy

states = [0]
actions = ['left', 'right']

def single_state_transition(state, action):
    if action == 'right':
        return (state, 0), True
    if action == 'left':
        threshold = np.random.random()
        if threshold > 0.9:
            return (state, 1), True
        else:
            return (state, 0), False

b = ModelFreePolicy(actions, states) #by default equiprobable
pi = ModelFreePolicy(actions, states)
pi.pi[0] = np.array([1, 0])

# calculate ordinary and weighted samples state value functions
vqpi_ord, samples_ord = off_policy_mc(states, actions, single_state_transition,
    policy=pi, b=b, ordinary=True, first_visit=True, gamma=1., n_episodes=1E4)

vqpi_w, samples_w = off_policy_mc(states, actions, single_state_transition, 
    policy=pi, b=b, ordinary=False, first_visit=True, gamma=1., n_episodes=1E4)


Monte Carlo Tree Search maze solving plot

s = START_XY
budget = 500
cp = 1/np.sqrt(2)
end = False
max_steps = 50
while not end:
    action, tree = mcts(s, cp, budget, obstacle_maze, action_map, max_steps, eps=1)
    (s, _), end = obstacle_maze(s, action)

tree.plot()


Contributing

While the code in this package provides a basic implementation of the algorithms from the book, it is not necessarily the most efficient or well-written. If you have suggestions for improving the code, please feel free to open an issue.

Overall, this package provides a valuable resource for anyone interested in learning about reinforcement learning and implementing algorithms from scratch. By no means prod ready.