RL-reimplementations

My goal here was just to have some fun reimplementing DDPG (before A3C) and policy gradient.

The PG implementation is from-scratch. For the DPPG implementation I originally wrote my own code and later 'forked' Patrick Emami's blog post for its better OO design (http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html).

Advantage-PG on pendulum swing-up:

Full video: https://youtu.be/pf-ATPFff74

Stuff I learned

TRPO is really hard to implement
DPPG is extremely sensitive to hyper-parameters. I couldn't get any experiments to work without exactly following those prescribed.
Both algorithms are very sensitive to random seed
It's sometimes better to not use OO design before you've got a working draft

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.ipynb_checkpoints		.ipynb_checkpoints
AC		AC
Misc		Misc
PG		PG
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL-reimplementations

Stuff I learned

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

boyentenbi/RL-reimplementations

Folders and files

Latest commit

History

Repository files navigation

RL-reimplementations

Stuff I learned

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages