Skip to content

Latest commit

 

History

History
131 lines (104 loc) · 5.92 KB

README.md

File metadata and controls

131 lines (104 loc) · 5.92 KB

lifelong_rl

Overview

Pytorch implementations of RL algorithms, focusing on model-based, lifelong, reset-free, and offline algorithms. Official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Originally dervied from rlkit.

Status

Project is released but will receive updates periodically. Contributions, bugs, benchmarking, or other comments are welcome.

Algorithms in this codebase

Note: "Online" here means not offline, i.e. data is being collected in an environment. "Batch" refers to algorithms that learn from data in batches, ex. PPO (rather than from a replay buffer), not as a synonym for offline RL.

*Reward and terminal functions are learned in this codebase for ease of flexibility, but we also support providing these by hand.

Usage

Installation

  1. Install Anaconda environment

    $ conda env create -f environment.yml
    

    Optionally, also install MuJoCo: see instructions here.

  2. Install doodad to run experiments (v0.2).

Running experiments

You can run experiments with:

python run_scripts/<script name>.py

Use -h to see more options for running. Experiments require a variant dictionary (equivalently to rlkit), which specify a base setting for each hyperparameter. Additionally, experiments also require a sweep_values dictionary, which should only contain the hyperparameters that will be swept over (overwriting the original value in variant).

Logging experiments

Results from experiments are saved in data/, and a snapshot containing the relevant networks to evaluate policies offline is stored in itr_$n every save_snapshot_every epochs. Data from the offline training phase is stored in offline_itr_$n instead. We support Viskit for plotting or Weights and Biases (include -w True the call to the run script).

Visualizing experiments

scripts/viz_hist.py can be used to record a video from a MuJoCo environment using stored data from the agent's replay buffer, which is modified to additionally store env sim states for MuJoCo environments. There are also a variety of ways visualization can be done manually.

Repo structure

  • agent_data/
    • Stores .pkl files of numpy arrays of past transitions
    • Useful for demonstrations, offline data, etc.
    • You can download some example datasets from our link here
  • data/
    • Stores logging information and experiment models
    • itr_$n is the snapshot after epoch $n; similarly offline_itr_$n is for offline training
  • experiment_configs/
    • Experiment configuration files
    • get_config creates a dictionary consisting of networks and parameters used to initialize a run
    • get_offline_algorithm and get_algorithm create an RLAlgorithm from the config
  • experiment_utils/
    • Files associated with launching experiments with doodad (should not require modification)
  • lifelong_rl/
    • Main codebase
  • run_scripts/
    • Scripts to launch experiments: pick config, algorithm, hyperparameters
    • If only both an offline algorithm and algorithm are specified, the offline algorithm is run first
    • Should specify hyperparameters for runs in variant
    • Optionally, perform a grid search over some hyperparameters usingsweep_params
  • scripts/
    • Example utility scripts

Acknowledgements

This codebase was originally modified from rlkit. Some parts of the code are taken from ProMP, mjrl, handful-of-trials-pytorch, and dads.

Citation

This is the official codebase for Reset-Free Lifelong Learning with Skill-Space Planning. Note that the code has been modified since the paper so results may be slightly different.

@inproceedings{lu2021lisp,
  title     = {Reset-Free Lifelong Learning with Skill-Space Planning},
  author    = {Kevin Lu and
               Aditya Grover and
               Pieter Abbeel and
               Igor Mordatch},
  booktitle = {9th International Conference on Learning Representations, {ICLR} 2021,
               Virtual Event, Austria, May 3-7, 2021},
  year      = {2021}
}

License

MIT