Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feature(zjow): add new pipeline agent sac/ddpg/a2c (#637)
* polish code * fix data type error for mujoco * polish code * polish code * Add features * fix base env manager readyimage * polish code * remove NoReturn * remove NoReturn * format code * format code * polish code * polish code * fix logger * format code * format code * change api for ckpt; polish code * polish code * format code * polish code * fix load bug * fix bug * fix dtype error * polish code * polish code * Add dqn agent * add config * add bonus/c51.py * add c51 logit monitor * add sac dqn agent * add sac dqn agent demo in dizoo * polish format * polish code * polish code * fix ddpg bug * merge nyz c51/dqn config and policy * fix config * remove mutistep_trainer * fix bug * polish code * polish code * polish code * add Hopper demo * polish code * add property best * add a2c pipeline * add sac halfcheetah+walker2d * fix a2c pipeline bug * fix pipeline bug * fix bug * change config * remove IMPALA pipeline * format code * polish code * polish c51 and add ddpg halfcheetah walker2d * add dizoo/common for zjow to review * fix agent best method * reset dizoo * delete common * polish for zjow to review * polish code * polish code * fix bug * fix bug * polish c51 * add pg agent * add pendulum config * add c51_atari td3_pendulum,bipedalwalker ddpg_pendulum * polish code * polish code * polish code * add bipedalwalker_ddpg_config * change config * change bipedalwalker config and noframeskip * polish c51-atari name * add pong spaceinvaders and qbert for dqn * polish code * polish code; add env mode * add rew_clip in ding_env_wrapper * polish dqn atari * add a2c continuous action space * add a2c continuous action space * add a2c continuous for mujoco * add a2c continuous for mujoco * add a2c continuous for mujoco * add a2c mujoco config; add ppo atari config * add a2c mujoco config; add ppo atari config * fix a2c deploy bug * Add bipedalwalker a2c * polish code * polish code * polish code * polish code * polish code * add pendulum a2c+pg * add pg bipedalwalker+mujoco * polish code for wandb sweep * polish code for wandb sweep * polish code for wandb sweep * polish code for a2c mujoco * add pg pendulum new pipeline * fix scalar action bug in random collect * polish pg algorithm * add bonus pg config * polish pg config * polish config * polish code * change pendulum pg config * fix continuous action dim=1 bug * Add ppof lr scheduler * polish config * fix random collect bug for dqn * polish ppo qbert spaceinvader config * remove mujoco wrapper * polish a2c mujoco config; add ppo offpolicy agent pipeline * Add wandb monitor evaluate return std * polish deploy method * format code * polish code * polish pg pendulum+hopper config * fix data shape bug * fix ppo offpolicy deploy bug * fix mujoco reward action env clip bug * fix mujoco reward action env clip bug * fix deploy env mode bug * fix env reset bug for deployment and evaluation * Add ppo offpolicy atari config * polish config * polish config code * polish code; add SQL * polish code * change config path * add compatibility fix for nstep * polish code * Add ppo offpolicy continuous policy * polish config * add ppo offpolicy general action modeling * add dependencies * polish config * polish deploy * Add array video helper * polish deploy * polish config * polish setup * fix config bug * polish code * polish code * polish code * fix bug in evaluator * polish code * fix bug in ckpt_saver order * fix format * fix bug in reward shape * format type * polish code * fix nstep error for ppo offpolicy * fix bug in action shape of cql when dim is 1 * polish code * delete config not work * polish code; remove ppof general datatype * remove useless code * polish code * polish code * fix a2c unittest * fix advantages_estimator unittest * fix combination_argmax_sample unittest * fix unittest bug * fix wandb logger unittest bug * polish code * move config position * remove useless config * polish code * add unittest for montecarlo_return_estimator * fix bug in termination checker * polish code --------- Co-authored-by: zhangpaipai <[email protected]> Co-authored-by: Ruoyu Gao <[email protected]> Co-authored-by: Swain <[email protected]>
- Loading branch information