0.4.1

Trinkle23897 released this 04 Apr 09:36

· 589 commits to master since this release

API Change

Add observation normalization in BaseVectorEnv (norm_obs, obs_rms, update_obs_rms and RunningMeanStd) (#308)
Add policy.map_action to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313)
Add lr_scheduler in on-policy algorithms, typically for LambdaLR (#318)

Note

To adapt with this version, you should change the action_range=... to action_space=env.action_space in policy initialization.

Bug Fix

Fix incorrect behaviors (error when n/ep==0 and reward shown in tqdm) with on-policy algorithm (#306, #328)
Fix q-value mask_action error for obs_next (#310)

Enhancement

Release SOTA Mujoco benchmark (DDPG/TD3/SAC: #305, REINFORCE: #320, A2C: #325, PPO: #330) and add corresponding notes in /examples/mujoco/README.md
Fix numpy>=1.20 typing issue (#323)
Add cross-platform unittest (#331)
Add a test on how to deal with finite env (#324)
Add value normalization in on-policy algorithms (#319, #321)
Separate advantage normalization and value normalization in PPO (#329)

Assets 4