Reinforcement Learning

This is a place for for to play with rl algorithms. This repo is based on the original SB3-Contrib.

Setup

Create a conda env (I used python 3.8)
Install dependencies

pip install stable_baselines3 pyyaml tensorboard tqdm rich

Install this repo in editable mode

pip install -e .

Install additional environments you want to use, for example for the gymnasium box2d environments:

pip install swig

pip install 'gymnasium[box2d]'

or mujoco:

pip install 'cython<2'

pip install mujoco mujoco_py

For mujoco, also follow the instructions here to install the mujoco binaries on your system: https://github.com/openai/mujoco-py

NeurDS-Lab Visual RL Environments

You could also find interesting the training environments we use for studying representation learning. These are mostly visual rl environments (rl from pixel observations).

Deepmind control suite. Install with pip install dm_control (you will need mujoco from above for the basic suite of envs though). Custom envs based on unsupervised rl benchmark and openai gym coming soon.
Deepmind lab. You can use our apptainer image: https://gitlab.mpcdf.mpg.de/mpcdf-dataanalytics/deepmind-lab/
Causal World (apptainer image coming soon)
Topoworld (Custom maze environments with varying topological complexity based on Minigrid and Miniworld): https://github.com/milosen/topoworld
Atari Benchmark (Ask Charlotte)

Stable-Baselines3 - Contrib (SB3-Contrib)

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code. "sb3-contrib" for short.

What is SB3-Contrib?

A place for RL algorithms and tools that are considered experimental, e.g. implementations of the latest publications. Goal is to keep the simplicity, documentation and style of stable-baselines3 but for less matured implementations.

Why create this repository?

Over the span of stable-baselines and stable-baselines3, the community has been eager to contribute in form of better logging utilities, environment wrappers, extended support (e.g. different action spaces) and learning algorithms.

However sometimes these utilities were too niche to be considered for stable-baselines or proved to be too difficult to integrate well into the existing code without creating a mess. sb3-contrib aims to fix this by not requiring the neatest code integration with existing code and not setting limits on what is too niche: almost everything remotely useful goes! We hope this allows us to provide reliable implementations following stable-baselines usual standards (consistent style, documentation, etc) beyond the relatively small scope of utilities in the main repository.

Features

See documentation for the full list of included features.

RL Algorithms:

Augmented Random Search (ARS)
Quantile Regression DQN (QR-DQN)
PPO with invalid action masking (MaskablePPO)
PPO with recurrent policy (RecurrentPPO aka PPO LSTM)
Truncated Quantile Critics (TQC)
Trust Region Policy Optimization (TRPO)

Gym Wrappers:

Time Feature Wrapper

Documentation

Documentation is available online: https://sb3-contrib.readthedocs.io/

Installation

To install Stable Baselines3 contrib with pip, execute:

pip install sb3-contrib

We recommend to use the master version of Stable Baselines3.

To install Stable Baselines3 master version:

pip install git+https://github.com/DLR-RM/stable-baselines3

To install Stable Baselines3 contrib master version:

pip install git+https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

How To Contribute

If you want to contribute, please read CONTRIBUTING.md guide first.

Citing the Project

To cite this repository in publications (please cite SB3 directly):

@article{stable-baselines3,
  author  = {Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann},
  title   = {Stable-Baselines3: Reliable Reinforcement Learning Implementations},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {268},
  pages   = {1-8},
  url     = {http://jmlr.org/papers/v22/20-1364.html}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!