Custom imitation library for making a RecurrentReward model.
I used GRU.
To support Dict type observation, added dict_preference.py
and dict_reward_nets.py
.
GRU reward net: you can use dict type obs
Non-GRU reward net: you can use dict type obs
No Ensembling train python3 train.py
Ensembling train python3 train_ensemble.py
go to test_net
and modify your code
CartPole
BipedalWalker
Pendulum-v1
MountainCar
torch: 1.13.1+cu116
imitation: 1.0.0
stable_baselines3: 2.3.0
[Note]
To make a fixed horizon, i used a AbsorbAfterDoneWrapper
which is from seals.
All parameters are same, except which has recurrent neural network or not.
- GRU
GRU_reward_50-episode-1.mp4
- No GRU
Non_GRU_reward_50-episode-1.mp4
- GRU w/ ensemble
GRU_reward-episode-0.mp4
- No GRU w/ ensemble
Non_GRU_reward-episode-0.mp4
GRU-PPO for stable-baselines3 (or contrib) library