Releases · DLR-RM/rl-baselines3-zoo

31 Mar 19:07

araffin

v2.3.0

e06914e

RL-Zoo3 v2.3.0 Latest

Latest

Breaking Changes

Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
Upgraded to SB3 >= 2.3.0

Other

Added test dependencies to setup.py (@power-edge)
Simplify dependencies of requirements.txt (remove duplicates from setup.py)

Full Changelog: v2.2.1...v2.3.0

Contributors

power-edge

Assets 2

17 Nov 23:39

araffin

v2.2.1

28dc228

RL-Zoo3 v2.2.1

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

Removed gym dependency, the package is still required for some pretrained agents.
Upgraded to SB3 >= 2.2.1
Upgraded to Huggingface-SB3 >= 3.0
Upgraded to pytablewriter >= 1.0

New Features

Added --eval-env-kwargs to train.py (@Quentin18)
Added ppo_lstm to hyperparams_opt.py (@technocrat13)

Bug fixes

Upgraded to pybullet_envs_gymnasium>=0.4.0
Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)

Documentation

Other

Updated docker image, removed support for X server
Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)
Switched to ruff for sorting imports
Updated tests to use shlex.split()
Fixed rl_zoo3/hyperparams_opt.py type hints
Fixed rl_zoo3/exp_manager.py type hints

Contributors

technocrat13 and Quentin18

Assets 2

20 Aug 12:17

araffin

v2.1.0

7f98df9

RL-Zoo3 v2.1.0

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

Breaking Changes

Dropped python 3.7 support
SB3 now requires PyTorch 1.13+
Upgraded to SB3 >= 2.1.0
Upgraded to Huggingface-SB3 >= 2.3
Upgraded to Optuna >= 3.0
Upgraded to cloudpickle >= 2.2.1

New Features

Added python 3.11 support

Full Changelog: v2.0.0...v2.1.0

Assets 2

23 Jun 13:00

araffin

v2.0.0

07f7447

RL-Zoo3 v2.0.0: Gymnasium Support

Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes

Fixed bug in HistoryWrapper, now returns the correct obs space limits
Upgraded to SB3 >= 2.0.0
Upgraded to Huggingface-SB3 >= 2.2.5
Upgraded to Gym API 0.26+, RL Zoo3 doesn't work anymore with Gym 0.21

New Features

Added Gymnasium support
Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper

Bug fixes

Renamed CarRacing-v1 to CarRacing-v2 in hyperparameters
Huggingface push to hub now accepts a --n-timesteps argument to adjust the length of the video
Fixed record_video steps (before it was stepping in a closed env)

Full Changelog: v1.8.0...v2.0.0

Assets 2

08 Apr 16:09

araffin

v1.8.0

483319b

RL-Zoo3 v1.8.0 : New Documentation, OpenRL Benchmark, Multi-Env HerReplayBuffer

Release 1.8.0 (2023-04-07)

We have run a massive and open source benchmark of all algorithms on all environments from the RL Zoo: Open RL Benchmark

New documentation: https://rl-baselines3-zoo.readthedocs.io/en/master/

Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

Breaking Changes

Upgraded to SB3 >= 1.8.0
Upgraded to new HerReplayBuffer implementation that supports multiple envs
Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.

New Features

Tuned hyperparameters for RecurrentPPO on Swimmer
Documentation is now built using Sphinx and hosted on read the doc
Open RL Benchmark

Bug fixes

Set highway-env version to 1.5 and setuptools to v65.5 for the CI
Removed use_auth_token for push to hub util
Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)

Documentation

Documentation is now built using Sphinx and hosted on read the doc: https://rl-baselines3-zoo.readthedocs.io/en/master/

Other

Added support for ruff (fast alternative to flake8) in the Makefile
Removed Gitlab CI file
Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
Switched to ruff and pyproject.toml
Removed online_sampling and max_episode_length argument when using HerReplayBuffer

Assets 2

10 Jan 22:22

araffin

v1.7.0

acdfc93

RL-Zoo3 v1.7.0 : Added support for python config files

Release 1.7.0 (2023-01-10)

SB3 v1.7.0, added support for python config files

We are currently creating an open source benchmark, please read openrlbenchmark/openrlbenchmark#7 if you want to help

Breaking Changes

--yaml-file argument was renamed to -conf (--conf-file) as now python file are supported too
Upgraded to SB3 >= 1.7.0 (changed net_arch=[dict(pi=.., vf=..)] to net_arch=dict(pi=.., vf=..))

New Features

Specifying custom policies in yaml file is now supported (@Rick-v-E)
Added monitor_kwargs parameter
Handle the env_kwargs of render:True under the hood for panda-gym v1 envs in enjoy replay to match visualzation behavior of other envs
Added support for python config file
Tuned hyperparameters for PPO on Swimmer
Added -tags/--wandb-tags argument to train.py to add tags to the wandb run
Added a sb3 version tag to the wandb run

Bug fixes

Allow python -m rl_zoo3.cli to be called directly
Fixed a bug where custom environments were not found despite passing --gym-package when using subprocesses
Fixed TRPO hyperparameters for MinitaurBulletEnv-v0, MinitaurBulletDuckEnv-v0, HumanoidBulletEnv-v0, InvertedDoublePendulumBulletEnv-v0 and InvertedPendulumSwingupBulletEnv

Documentation

Other

scripts/plot_train.py plots models such that newer models appear on top of older ones.
Added additional type checking using mypy
Standardized the use of from gym import spaces

Contributors

Rick-v-E

Assets 2

03 Oct 16:13

araffin

v1.6.2

b372e9a

RL-Zoo3 v1.6.2: The RL Zoo is now a package!

Highlights

You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).

File: train.py (you can use python train.py --algo sbx_tqc --env Pendulum-v1 afterward)

import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train

from sbx import TQC

# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS

if __name__ == "__main__":
    train()

Breaking Changes

RL Zoo is now a python package
low pass filter was removed

New Features

RL Zoo cli: rl_zoo3 train and rl_zoo3 enjoy

Assets 2

30 Sep 12:32

araffin

v1.6.1

8600d80

SB3 v1.6.1: Progress bar and custom yaml file

Breaking Changes

Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
Upgraded to sb3-contrib >= 1.6.1

New Features

Added --yaml-file argument option for train.pyto read hyperparameters from custom yaml files (@JohannesUl)

Bug fixes

Added custom_object parameter on record_video.py (@Affonso-Gui)
Changed optimize_memory_usage to False for DQN/QR-DQN on record_video.py (@Affonso-Gui)
In ExperimentManager _maybe_normalize set training to False for eval envs,
to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).
Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
Added progress bar via the -P argument using tqdm and rich

Contributors

pchalasani, SammyRamone, and 2 other contributors

Assets 2

17 Aug 15:47

araffin

v1.6.0

89d4e0c

SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)

Release 1.6.0 (2022-08-05)

Breaking Changes

Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
Updated default --eval-freq from 10k to 25k steps
Update default horizon to 2 for the HistoryWrapper
Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
Upgrade to sb3-contrib >= 1.6.0

New Features

Support setting PyTorch's device with thye --device flag (@Gregwar)
Add --max-total-trials parameter to help with distributed optimization. (@ernestum)
Added vec_env_wrapper support in the config (works the same as env_wrapper)
Added Huggingface hub integration
Added RecurrentPPO support (aka ppo_lstm)
Added autodownload for "official" sb3 models from the hub
Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
Added MsPacman models

Bug fixes

Fix Reacher-v3 name in PPO hyperparameter file
Pinned ale-py==0.7.4 until new SB3 version is released
Fix enjoy / record videos with LSTM policy
Fix bug with environments that have a slash in their name (@ernestum)
Changed optimize_memory_usage to False for DQN/QR-DQN on Atari games,
if you want to save RAM, you need to deactivate handle_timeout_termination
in the replay_buffer_kwargs

Documentation

Other

When pruner is set to "none", use NopPruner instead of diverted MedianPruner (@qgallouedec)

Contributors

Gregwar, ernestum, and 2 other contributors

Assets 2

25 Mar 14:21

araffin

v1.5.0

2fe4418

SB3 v1.5.0: Support for Weight and Biases experiment tracking

Release 1.5.0 (2022-03-25)

Support for Weight and Biases experiment tracking

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
Upgrade to sb3-contrib >= 1.5.0
Upgraded to gym 0.21

New Features

Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
Support experiment tracking via Weights and Biases via the --track flag (@vwxyzjn)
Support tracking raw episodic stats via RawStatisticsCallback (@vwxyzjn, see #216)

Bug fixes

Policies saved during during optimization with distributed Optuna load on new systems (@JKTerry)
Fixed script for recording video that was not up to date with the enjoy script

Contributors

JKTerry and vwxyzjn

Assets 2

Releases: DLR-RM/rl-baselines3-zoo

RL-Zoo3 v2.3.0

Breaking Changes

Other

Contributors

RL-Zoo3 v2.2.1

Breaking Changes

New Features

Bug fixes

Documentation

Other

Contributors

RL-Zoo3 v2.1.0

Breaking Changes

New Features

RL-Zoo3 v2.0.0: Gymnasium Support

Breaking Changes

New Features

Bug fixes

RL-Zoo3 v1.8.0 : New Documentation, OpenRL Benchmark, Multi-Env HerReplayBuffer

Release 1.8.0 (2023-04-07)

Breaking Changes

New Features

Bug fixes

Documentation

Other

RL-Zoo3 v1.7.0 : Added support for python config files

Release 1.7.0 (2023-01-10)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Contributors

RL-Zoo3 v1.6.2: The RL Zoo is now a package!

Highlights

Breaking Changes

New Features

SB3 v1.6.1: Progress bar and custom yaml file

Breaking Changes

New Features

Bug fixes

Contributors

SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)

Release 1.6.0 (2022-08-05)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Contributors

SB3 v1.5.0: Support for Weight and Biases experiment tracking

Release 1.5.0 (2022-03-25)

Breaking Changes

New Features

Bug fixes

Contributors