Skip to content

A from-scratch reimplementation of vanilla policy gradient

Notifications You must be signed in to change notification settings

boyentenbi/RL-reimplementations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL-reimplementations

My goal here was just to have some fun reimplementing DDPG (before A3C) and policy gradient.

The PG implementation is from-scratch. For the DPPG implementation I originally wrote my own code and later 'forked' Patrick Emami's blog post for its better OO design (http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html).

Advantage-PG on pendulum swing-up:

Alt Text

Full video: https://youtu.be/pf-ATPFff74

Stuff I learned

  • TRPO is really hard to implement
  • DPPG is extremely sensitive to hyper-parameters. I couldn't get any experiments to work without exactly following those prescribed.
  • Both algorithms are very sensitive to random seed
  • It's sometimes better to not use OO design before you've got a working draft

About

A from-scratch reimplementation of vanilla policy gradient

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published