Skip to content

Latest commit

 

History

History

BasicPolicyGradient

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Deep Reinforcement Learning

TaxonomyPolicyGradient

:: Basic policy gradient

Policy gradient is a on-policy method which seek to directly optimize the policy by using sampled trajectories as weight. Those weights will then be used to indicate how good the policy performed. Based on that knowledge, the algorithm updates the parameters of his policy to make action leading to similar good trajectories more likely and similar bad trajectories less likely. In the case of Deep Reinforcement Learning, the policy parameter is a neural net. For this essay, I've studied and implemented the basic version of policy gradient also known as REINFORCE. I've also complemented my reading with the following ressources:


Download the essay pdf

Watch recorded agent


The REINFORCE implementation:

Note: You can check explanation on how to use the package by using the --help flag

To watch the trained algorithm
cd DRLimplementation
python -m BasicPolicyGradient [--record] [--play_for]=max trajectories (default=10)  
To execute the training loop
cd DRLimplementation
python -m BasicPolicyGradient --train
To navigate trough the computation graph in TensorBoard
tensorboard --logdir=DRLimplementation/BasicPolicyGradient/graph/runs

Trained agent in action