Skip to content

0.4.1

Compare
Choose a tag to compare
@Trinkle23897 Trinkle23897 released this 04 Apr 09:36
· 589 commits to master since this release
dd4a011

API Change

  1. Add observation normalization in BaseVectorEnv (norm_obs, obs_rms, update_obs_rms and RunningMeanStd) (#308)
  2. Add policy.map_action to bound with raw action (e.g., map from (-inf, inf) to [-1, 1] by clipping or tanh squashing), and the mapped action won't store in replaybuffer (#313)
  3. Add lr_scheduler in on-policy algorithms, typically for LambdaLR (#318)

Note

To adapt with this version, you should change the action_range=... to action_space=env.action_space in policy initialization.

Bug Fix

  1. Fix incorrect behaviors (error when n/ep==0 and reward shown in tqdm) with on-policy algorithm (#306, #328)
  2. Fix q-value mask_action error for obs_next (#310)

Enhancement

  1. Release SOTA Mujoco benchmark (DDPG/TD3/SAC: #305, REINFORCE: #320, A2C: #325, PPO: #330) and add corresponding notes in /examples/mujoco/README.md
  2. Fix numpy>=1.20 typing issue (#323)
  3. Add cross-platform unittest (#331)
  4. Add a test on how to deal with finite env (#324)
  5. Add value normalization in on-policy algorithms (#319, #321)
  6. Separate advantage normalization and value normalization in PPO (#329)