educational codebase demonstrating some of the most common RL algorithms
These days I find myself simply hijacking an RL codebase when I want to use such and such algorithm or method. Often times these codebases are general, which makes them great for research because they can be applied to a wide variety of environments. Unfortunately this also makes them a pain to dissect and understand. This codebase is meant to be a collection of bare bones, highly commented implementations of some of the cornerstone algorithms in modern RL.
This is not a production or research codebase. Do not use the code here to do any kind of research beyond just tinkering for your own understanding. Many of these implementations will not be optimized or even thuroughly tested on anything but the most basic openai environments (cartpole, pole balance, etc...). If you find glaring errors or issues, please dont hesitate to post about them!