Q-Value based
Replay Buffer
Distributional
Exploration
- Noisy Networks for Exploration
- Curiousity-driven Exploration (ICM)
- Random Network Distillation (RND)
Combination
Distributed
- Distributed Prioritized Experience Replay (APE-X)
- Recurrent Experience Replay in Distributed RL (R2D2)(🚧 implementing…)
Policy Optimization, Actor-Critic
- REINFORCE [Discrete, Continuous]
- Deep Deterministic Policy Gradient
- Proximal Policy Optimization (PPO) [Discrete, Continuous]
- Soft Actor Critic (SAC) [Continuous]
- Maximum a posteriori Policy Optimization(MPO) [Discrete, Continuous]
- V-MPO [Discrete, Continuous]