Skip to content

This project is an exploration of the mathematical and programming foundations of several policy gradient deep reinforcement learning algorithms applied to OpenAI Gym's Super Mario Bros.

Notifications You must be signed in to change notification settings

ClarkQTIM/Policy-Gradient-Deep-Reinforcement-Algorithms-and-Super-Mario-Bros

Repository files navigation

Policy Gradient Deep Reinforcement Algorithms and Super Mario Bros

This project is an exploration of the mathematical and programming foundations of several policy gradient deep reinforcement learning algorithms applied to OpenAI Gym's Super Mario Bros. Policy Gradient algorithms in Reinforcement Learning are those which seek to find the best policy, $\pi$, to maximize the rewards over trajectories.

This project explores the following Policy Gradient algorithms: REINFORCE (with Rewards to Go), Advantage Actor Critic, and Trust Region/Proximal Policy Optimization. This project also includes, just for comparison, Deep Q-Network.

Note, all models are uploaded except for the PPO, as it is too large for GitHub.

About

This project is an exploration of the mathematical and programming foundations of several policy gradient deep reinforcement learning algorithms applied to OpenAI Gym's Super Mario Bros.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published