Releases: DLR-RM/rl-baselines3-zoo
RL-Zoo3 v2.3.0
Breaking Changes
- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
- Upgraded to SB3 >= 2.3.0
Other
- Added test dependencies to
setup.py
(@power-edge) - Simplify dependencies of
requirements.txt
(remove duplicates fromsetup.py
)
Full Changelog: v2.2.1...v2.3.0
RL-Zoo3 v2.2.1
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
Breaking Changes
- Removed
gym
dependency, the package is still required for some pretrained agents. - Upgraded to SB3 >= 2.2.1
- Upgraded to Huggingface-SB3 >= 3.0
- Upgraded to pytablewriter >= 1.0
New Features
- Added
--eval-env-kwargs
totrain.py
(@Quentin18) - Added
ppo_lstm
to hyperparams_opt.py (@technocrat13)
Bug fixes
- Upgraded to
pybullet_envs_gymnasium>=0.4.0
- Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
Documentation
Other
- Updated docker image, removed support for X server
- Replaced deprecated
optuna.suggest_uniform(...)
byoptuna.suggest_float(..., low=..., high=...)
- Switched to ruff for sorting imports
- Updated tests to use
shlex.split()
- Fixed
rl_zoo3/hyperparams_opt.py
type hints - Fixed
rl_zoo3/exp_manager.py
type hints
RL-Zoo3 v2.1.0
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
Breaking Changes
- Dropped python 3.7 support
- SB3 now requires PyTorch 1.13+
- Upgraded to SB3 >= 2.1.0
- Upgraded to Huggingface-SB3 >= 2.3
- Upgraded to Optuna >= 3.0
- Upgraded to cloudpickle >= 2.2.1
New Features
- Added python 3.11 support
Full Changelog: v2.0.0...v2.1.0
RL-Zoo3 v2.0.0: Gymnasium Support
Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes
- Fixed bug in HistoryWrapper, now returns the correct obs space limits
- Upgraded to SB3 >= 2.0.0
- Upgraded to Huggingface-SB3 >= 2.2.5
- Upgraded to Gym API 0.26+, RL Zoo3 doesn't work anymore with Gym 0.21
New Features
- Added Gymnasium support
- Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
Bug fixes
- Renamed
CarRacing-v1
toCarRacing-v2
in hyperparameters - Huggingface push to hub now accepts a
--n-timesteps
argument to adjust the length of the video - Fixed
record_video
steps (before it was stepping in a closed env)
Full Changelog: v1.8.0...v2.0.0
RL-Zoo3 v1.8.0 : New Documentation, OpenRL Benchmark, Multi-Env HerReplayBuffer
Release 1.8.0 (2023-04-07)
We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark
New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/
Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
Breaking Changes
- Upgraded to SB3 >= 1.8.0
- Upgraded to new
HerReplayBuffer
implementation that supports multiple envs - Removed
TimeFeatureWrapper
for Panda and Fetch envs, as the new replay buffer should handle timeout.
New Features
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Open RL Benchmark
Bug fixes
- Set
highway-env
version to 1.5 andsetuptools to
v65.5 for the CI - Removed
use_auth_token
for push to hub util - Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
- Fixed
gym-minigrid
policy (fromMlpPolicy
toMultiInputPolicy
)
Documentation
- Documentation is now built using Sphinx and hosted on read the doc: https://rl-baselines3-zoo.readthedocs.io/en/master/
Other
- Added support for
ruff
(fast alternative to flake8) in the Makefile - Removed Gitlab CI file
- Replaced deprecated
optuna.suggest_loguniform(...)
byoptuna.suggest_float(..., log=True)
- Switched to
ruff
andpyproject.toml
- Removed
online_sampling
andmax_episode_length
argument when usingHerReplayBuffer
RL-Zoo3 v1.7.0 : Added support for python config files
Release 1.7.0 (2023-01-10)
SB3 v1.7.0, added support for python config files
We are currently creating an open source benchmark, please read openrlbenchmark/openrlbenchmark#7 if you want to help
Breaking Changes
--yaml-file
argument was renamed to-conf
(--conf-file
) as now python file are supported too- Upgraded to SB3 >= 1.7.0 (changed
net_arch=[dict(pi=.., vf=..)]
tonet_arch=dict(pi=.., vf=..)
)
New Features
- Specifying custom policies in yaml file is now supported (@Rick-v-E)
- Added
monitor_kwargs
parameter - Handle the
env_kwargs
ofrender:True
under the hood for panda-gym v1 envs inenjoy
replay to match visualzation behavior of other envs - Added support for python config file
- Tuned hyperparameters for PPO on Swimmer
- Added
-tags/--wandb-tags
argument totrain.py
to add tags to the wandb run - Added a sb3 version tag to the wandb run
Bug fixes
- Allow
python -m rl_zoo3.cli
to be called directly - Fixed a bug where custom environments were not found despite passing
--gym-package
when using subprocesses - Fixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv
Documentation
Other
scripts/plot_train.py
plots models such that newer models appear on top of older ones.- Added additional type checking using mypy
- Standardized the use of
from gym import spaces
RL-Zoo3 v1.6.2: The RL Zoo is now a package!
Highlights
You can now install the RL Zoo via pip: pip install rl-zoo3
and it has a basic command line interface (rl_zoo3 train|enjoy|plot_train|all_plots
) that has the same interface as the scripts (train.py|enjoy.py|...
).
You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).
File: train.py
(you can use python train.py --algo sbx_tqc --env Pendulum-v1
afterward)
import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train
from sbx import TQC
# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS
if __name__ == "__main__":
train()
Breaking Changes
- RL Zoo is now a python package
- low pass filter was removed
New Features
- RL Zoo cli:
rl_zoo3 train
andrl_zoo3 enjoy
SB3 v1.6.1: Progress bar and custom yaml file
Breaking Changes
- Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
- Upgraded to sb3-contrib >= 1.6.1
New Features
- Added
--yaml-file
argument option fortrain.py
to read hyperparameters from custom yaml files (@JohannesUl)
Bug fixes
- Added
custom_object
parameter on record_video.py (@Affonso-Gui) - Changed
optimize_memory_usage
toFalse
for DQN/QR-DQN on record_video.py (@Affonso-Gui) - In
ExperimentManager
_maybe_normalize
settraining
toFalse
for eval envs,
to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani). - Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
- Added progress bar via the
-P
argument using tqdm and rich
SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)
Release 1.6.0 (2022-08-05)
Breaking Changes
- Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
- Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
- Updated default --eval-freq from 10k to 25k steps
- Update default horizon to 2 for the
HistoryWrapper
- Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
- Upgrade to sb3-contrib >= 1.6.0
New Features
- Support setting PyTorch's device with thye
--device
flag (@Gregwar) - Add
--max-total-trials
parameter to help with distributed optimization. (@ernestum) - Added
vec_env_wrapper
support in the config (works the same asenv_wrapper
) - Added Huggingface hub integration
- Added
RecurrentPPO
support (akappo_lstm
) - Added autodownload for "official" sb3 models from the hub
- Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
- Added MsPacman models
Bug fixes
- Fix
Reacher-v3
name in PPO hyperparameter file - Pinned ale-py==0.7.4 until new SB3 version is released
- Fix enjoy / record videos with LSTM policy
- Fix bug with environments that have a slash in their name (@ernestum)
- Changed
optimize_memory_usage
toFalse
for DQN/QR-DQN on Atari games,
if you want to save RAM, you need to deactivatehandle_timeout_termination
in thereplay_buffer_kwargs
Documentation
Other
- When pruner is set to
"none"
, useNopPruner
instead of divertedMedianPruner
(@qgallouedec)
SB3 v1.5.0: Support for Weight and Biases experiment tracking
Release 1.5.0 (2022-03-25)
Support for Weight and Biases experiment tracking
Breaking Changes
- Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
- Upgrade to sb3-contrib >= 1.5.0
- Upgraded to gym 0.21
New Features
- Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
- Support experiment tracking via Weights and Biases via the
--track
flag (@vwxyzjn) - Support tracking raw episodic stats via
RawStatisticsCallback
(@vwxyzjn, see #216)
Bug fixes
- Policies saved during during optimization with distributed Optuna load on new systems (@JKTerry)
- Fixed script for recording video that was not up to date with the enjoy script