This project is an exploration of the mathematical and programming foundations of several policy gradient deep reinforcement learning algorithms applied to OpenAI Gym's Super Mario Bros. Policy Gradient algorithms in Reinforcement Learning are those which seek to find the best policy,
This project explores the following Policy Gradient algorithms: REINFORCE (with Rewards to Go), Advantage Actor Critic, and Trust Region/Proximal Policy Optimization. This project also includes, just for comparison, Deep Q-Network.
Note, all models are uploaded except for the PPO, as it is too large for GitHub.