Policy Gradient Deep Reinforcement Algorithms and Super Mario Bros

This project is an exploration of the mathematical and programming foundations of several policy gradient deep reinforcement learning algorithms applied to OpenAI Gym's Super Mario Bros. Policy Gradient algorithms in Reinforcement Learning are those which seek to find the best policy, $\pi$, to maximize the rewards over trajectories.

This project explores the following Policy Gradient algorithms: REINFORCE (with Rewards to Go), Advantage Actor Critic, and Trust Region/Proximal Policy Optimization. This project also includes, just for comparison, Deep Q-Network.

Note, all models are uploaded except for the PPO, as it is too large for GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Graphics		Graphics
Models		Models
Asynchronous Actor Critic.ipynb		Asynchronous Actor Critic.ipynb
Dueling Double Deep Q-Network with PER.ipynb		Dueling Double Deep Q-Network with PER.ipynb
README.md		README.md
REINFORCE with Reward to Go.ipynb		REINFORCE with Reward to Go.ipynb
Trust Region and Proximal Policy Optimization.ipynb		Trust Region and Proximal Policy Optimization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Policy Gradient Deep Reinforcement Algorithms and Super Mario Bros

About

Releases

Packages

Languages

ClarkQTIM/Policy-Gradient-Deep-Reinforcement-Algorithms-and-Super-Mario-Bros

Folders and files

Latest commit

History

Repository files navigation

Policy Gradient Deep Reinforcement Algorithms and Super Mario Bros

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages