NOTE: This is not maintained. I recommend using the implementation here. It is much more full featured and tested.
This is a PyTorch implementation of Proximal Policy Optimization.
This is code mostly ported from the OpenAI baselines implementation but currently does not optimize each batch for several epochs. I will add this soon.
python main.py --env-name Walker2d-v1
Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.
- Add multiple epochs per batch
- Test results compared to baselines code