Hyperparameter tuning + TRPO
- added hyperparameter tuning using optuna
- a2c for continuous actions
- upgrade stable-baselines (v2.5.1)
- add support for trpo + mpi training
- fixed frame stack loading
now more than 100 trained agents.
now more than 100 trained agents.