Skip to content

Official implementation for: Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning ICLR'24

License

Notifications You must be signed in to change notification settings

quantumiracle/Consistency_Model_For_Reinforcement_Learning

Repository files navigation

Consistency Models for RL — Official PyTorch Implementation

Official implementation for:

Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
Zihan Ding, Chi Jin
https://arxiv.org/abs/2309.16984

Requirements

Installations of PyTorch, MuJoCo, and D4RL are needed. Please see the requirements.txt for environment set up details.

pip install -r requirements.txt

Run

You can use either diffusion model or consistency model.

Dataset

First download D4RL dataset with:

python download_data.py

The data will be saved in ./dataset/.

Offline RL

# train offline RL Consistency-AC for hopper-medium-v2 task
python offline.py --env_name hopper-medium-v2 --model consistency --ms offline --exp RUN_NAME --save_best_model --lr_decay
# train offline RL Diffusion-QL for walker2d-medium-expert-v2 task
python offline.py --env_name walker2d-medium-expert-v2 --model diffusion --ms offline --exp RUN_NAME --save_best_model --lr_decay

Online RL

From scratch:

# train online RL Consistency-AC for hopper-medium-v2 task
python online.py --env_name hopper-medium-v2 --num_envs 3 --model consistency --exp RUN_NAME
# train online RL Diffusion-QL for walker2d-medium-expert-v2 task
python online.py --env_name walker2d-medium-expert-v2 --num_envs 3 --model diffusion --exp RUN_NAME

Online RL initialized with offline pre-trained models (offline-to-online):

python online.py --env_name kitchen-mixed-v0 --num_envs 3 --model consistency --exp online_test --load_model 'results/**PATH**' --load_id 'online'

As an example, with a model saved in path results/**PATH**/actor_online.pth, it will be loaded for initializing the online training with the above command.

Training Scripts

Use bash scripts:

bash scripts/offline.sh
bash scripts/online.sh

Use Slurm scripts:

sbatch scripts/offline.slurm
sbatch scripts/online.slurm
sbatch scripts/offline2online.slurm

Citation

If you find this open source release useful, please cite in your paper:

@article{ding2023consistency,
  title={Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning},
  author={Ding, Zihan and Jin, Chi},
  journal={arXiv preprint arXiv:2309.16984},
  year={2023}
}

Acknowledgement

We acknowledge the original official repo of Diffusion Policy and corresponding paper: https://arxiv.org/abs/2208.06193.

About

Official implementation for: Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning ICLR'24

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published