lifelong_rl

Overview

Pytorch implementations of RL algorithms, focusing on model-based, lifelong, reset-free, and offline algorithms. Official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Originally dervied from rlkit.

Status

Project is released but will receive updates periodically. Contributions, bugs, benchmarking, or other comments are welcome.

Algorithms in this codebase

Reset-Free RL
- Lifelong Skill-Space Planning* (Lu et al. 2020)
Model-Based Online RL*
- Model-Based Policy Optimization (Janner et al. 2019)
- Model Predictive Control (ex. Chua et al. 2018)
- Learning Off-Policy with Online Planning (Sikchi et al. 2020)
Online Skill Discovery/Multitask RL
- Dynamics-Aware Discovery of Skills (Sharma et al. 2019)
- Hindsight Experience Replay (Andrychowicz et al. 2017)
Offline RL
- Model-Based Offline RL* (Kidambi et al. 2020)
- Model-Based Offline Policy Optimization* (Yu et al. 2020)
Model-Free Online RL
- Soft Actor Critic (Haarnoja et al. 2018)
- Twin Delayed Deep Deterministic Policy Gradient (Fujimoto et al. 2018)
- Proximal Policy Optimization (Schulman et al. 2017)
- Deep Deterministic Policy Gradient (Lillicrap et al. 2015)
- Natural Policy Gradient
- Vanilla Policy Gradient

Note: "Online" here means not offline, i.e. data is being collected in an environment. "Batch" refers to algorithms that learn from data in batches, ex. PPO (rather than from a replay buffer), not as a synonym for offline RL.

*Reward and terminal functions are learned in this codebase for ease of flexibility, but we also support providing these by hand.

Usage

Installation

Install Anaconda environment
```
$ conda env create -f environment.yml
```
Optionally, also install MuJoCo: see instructions here.
Install doodad to run experiments (v0.2).

Running experiments

You can run experiments with:

python run_scripts/<script name>.py

Use -h to see more options for running. Experiments require a variant dictionary (equivalently to rlkit), which specify a base setting for each hyperparameter. Additionally, experiments also require a sweep_values dictionary, which should only contain the hyperparameters that will be swept over (overwriting the original value in variant).

Logging experiments

Results from experiments are saved in data/, and a snapshot containing the relevant networks to evaluate policies offline is stored in itr_$n every save_snapshot_every epochs. Data from the offline training phase is stored in offline_itr_$n instead. We support Viskit for plotting or Weights and Biases (include -w True the call to the run script).

Visualizing experiments

scripts/viz_hist.py can be used to record a video from a MuJoCo environment using stored data from the agent's replay buffer, which is modified to additionally store env sim states for MuJoCo environments. There are also a variety of ways visualization can be done manually.

Repo structure

agent_data/
- Stores .pkl files of numpy arrays of past transitions
- Useful for demonstrations, offline data, etc.
- You can download some example datasets from our link here
data/
- Stores logging information and experiment models
- itr_$n is the snapshot after epoch $n; similarly offline_itr_$n is for offline training
experiment_configs/
- Experiment configuration files
- get_config creates a dictionary consisting of networks and parameters used to initialize a run
- get_offline_algorithm and get_algorithm create an RLAlgorithm from the config
experiment_utils/
- Files associated with launching experiments with doodad (should not require modification)
lifelong_rl/
- Main codebase
run_scripts/
- Scripts to launch experiments: pick config, algorithm, hyperparameters
- If only both an offline algorithm and algorithm are specified, the offline algorithm is run first
- Should specify hyperparameters for runs in variant
- Optionally, perform a grid search over some hyperparameters usingsweep_params
scripts/
- Example utility scripts

Acknowledgements

This codebase was originally modified from rlkit. Some parts of the code are taken from ProMP, mjrl, handful-of-trials-pytorch, and dads.

Citation

This is the official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Note that the code has been modified since the paper so results may be slightly different.

@inproceedings{lu2021lisp,
  title     = {Reset-Free Lifelong Learning with Skill-Space Planning},
  author    = {Kevin Lu and
               Aditya Grover and
               Pieter Abbeel and
               Igor Mordatch},
  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
               Virtual Event, Austria, May 3-7, 2021},
  year      = {2021}
}

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

lifelong_rl

Overview

Status

Algorithms in this codebase

Usage

Installation

Running experiments

Logging experiments

Visualizing experiments

Repo structure

Acknowledgements

Citation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

lifelong_rl

Overview

Status

Algorithms in this codebase

Usage

Installation

Running experiments

Logging experiments

Visualizing experiments

Repo structure

Acknowledgements

Citation

License