This repository is the official implementation of Deep Variance Weighting for the MinAtar experiments in Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.
- We modified CleanRL repository (commit d67ae0c) and MinAtar repository (commit 548b136).
- You can see the implementation of M-DQN with DVW in cleanrl/dqn_minatar.py (or cleanrl/dqn.py). We leave other files same as the original CleanRL.
- Step 1: Install dependencies
# make sure you are in Variance-Weighted-MDVI/Deep-Variance-Weighting-MinAtar
poetry install
# Install MinAtar in submodule
poetry shell
git submodule update --init && cd MinAtar
pip install -e .
- Step 2: Login to wandb (for ease of visualization and plotting)
wandb login # only required for the first time
You can test if everything works by:
# If you have something wrong with GPU, please replace "--device cuda" with "--device cpu"
# Weighted M-DQN
poetry run python cleanrl/dqn_minatar.py --total-timesteps 50000 --env-id breakout --track --wandb-project-name minatar-test --exp-name Weight-Net-M-DQN --weight-type variance-net --device cuda
# M-DQN
poetry run python cleanrl/dqn_minatar.py --total-timesteps 50000 --env-id breakout --track --wandb-project-name minatar-test --exp-name M-DQN --weight-type none --device cuda
# Weighted DQN
poetry run python cleanrl/dqn_minatar.py --total-timesteps 50000 --env-id breakout --track --wandb-project-name minatar-test --exp-name Weight-Net-M-DQN --weight-type variance-net --kl-coef 0.0 --ent-coef 0.0 --device cuda
# DQN
poetry run python cleanrl/dqn_minatar.py --total-timesteps 50000 --env-id breakout --track --wandb-project-name minatar-test --exp-name DQN --weight-type none --kl-coef 0.0 --ent-coef 0.0 --device cuda
Run bash run_minatar.bash
Run all the cells in minatar-results/result-plotter.ipynb. The figures will be saved in minatar-results directory.
If you are interested in other environments, try the following for classic controls:
# If you have something wrong with GPU, please replace "--device cuda" with "--device cpu"
# Weighted M-DQN
poetry run python cleanrl/dqn.py --total-timesteps 50000 --env-id CartPole-v1 --track --wandb-project-name classic-control-test --exp-name Weight-Net-M-DQN --weight-type variance-net --device cuda
# M-DQN
poetry run python cleanrl/dqn.py --total-timesteps 50000 --env-id CartPole-v1 --track --wandb-project-name classic-control-test --exp-name M-DQN --weight-type none --device cuda
# Weighted DQN
poetry run python cleanrl/dqn.py --total-timesteps 50000 --env-id CartPole-v1 --track --wandb-project-name classic-control-test --exp-name Weight-Net-M-DQN --weight-type variance-net --kl-coef 0.0 --ent-coef 0.0 --device cuda
# DQN
poetry run python cleanrl/dqn.py --total-timesteps 50000 --env-id CartPole-v1 --track --wandb-project-name classic-control-test --exp-name DQN --weight-type none --kl-coef 0.0 --ent-coef 0.0 --device cuda