Each experiment uses 3 seeds and is trained for 10M environment steps. The parameters used for Clipped PPO are the same parameters as described in the original paper.
coach -p Mujoco_ClippedPPO -lvl inverted_pendulum
coach -p Mujoco_ClippedPPO -lvl inverted_double_pendulum
coach -p Mujoco_ClippedPPO -lvl reacher
coach -p Mujoco_ClippedPPO -lvl hopper
coach -p Mujoco_ClippedPPO -lvl half_cheetah
coach -p Mujoco_ClippedPPO -lvl walker2d
coach -p Mujoco_ClippedPPO -lvl ant
coach -p Mujoco_ClippedPPO -lvl swimmer
coach -p Mujoco_ClippedPPO -lvl humanoid