Skip to content

guytenn/Terminator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Terminator (TermPG)

Open-source codebase for Terminator Policy Gradient (TermPG), from "Reinforcement Learning with a Terminator (Neurips 2022)".

Installation

To use Terminator, make sure python3 is installed and pip is up to date. This project was tested using python version 3.8.

Clone the Repository

    git clone https://github.com/guytenn/Terminator.git

It is recommended to install requirements using a virtual environment. To set up a virtual environment follow these steps

    cd Terminator/
    python3 -m venv terminator_env
    source terminator_env/bin/activate
    # Upgrade Pip
    pip install --upgrade pip

Install Requirements

While in Terminator directory install requirements using the following command

    pip install .

Download Backseat Driver

You can find the latest version of Backseat Driver here (currently only supports linux).

Download and unzip the files to src/envs/backseat_driver/build/

Your file system should be organized as follows.

    src/envs/backseat_driver/build/BackseatDriverTerm_BurstDebugInformation_DoNotShip
    src/envs/backseat_driver/build/BackseatDriverTerm_Data
    src/envs/backseat_driver/build/BackseatDriverTerm.x86_64
    src/envs/backseat_driver/build/UnityPlayer.so

Usage

Quick start

The following will run TermPG using its default parameters

    python3 run.py --learn_costs --termination_gamma

Below you can find a list of arguments you can change

TermPG Arguments Description
--learn_costs Learn costs according to TermPG
--termination_gamma Use dynamic discount factor according to TermPG
--cost_coef 1 Cost coefficient for termination in environment
--bonus_coef 1 Bonus coefficient for termination cost confidence in TermPG
--bonus_type 'maxmin' Type of bonus to use for costs (one of: 'none', 'std', 'maxmin')
--reward_penalty_coef 0 Penalty coefficient for costs (penalize reward by estimated costs)
--termination_penalty 0 A Penalty for termination (reward shaping variant)
--reward_bonus_coef 0 Bonus coefficient for optimism in costs
--window 30 Window size for termination
--env_window -1 The real window the env will use for termination. If -1 will use default window.
--n_ensemble 3 Number of networks to use in cost model ensemble
--term_train_steps 30 Number of train steps to train terminator
--term_batch_size 64 Batch size for terminator
--term_replay_size 1000 Replay size for terminator
--cost_in_state Will add true cost to state (TerMDP with known costs)
--no_termination Will disable termination in environment
--cost_history_in_state Will add history of costs to state in addition to accumulated cost
General Arguments Description
--train_timesteps 1000000 Number of simulation timesteps to train a policy
--train_batch_size 1024 Number of timesteps collected for each SGD round. This defines the size of each SGD epoch.
--batch_size 32 Total SGD batch size across all devices for SGD. This defines the minibatch size within each epoch.
--num_epochs 3 Number of SGD iterations in each outer loop (i.e., number of epochs to execute per train batch).
--graphics When enabled will render environment
--wandb Log to wandb
--project_name Project name for wandb logging
--run_name Run name for wandb logging
--num_processes 8 Number of workers during training (value of -1 will use all cpus)
--num_gpus 1 Number of gpus to use for training
--entropy_coeff 0 Entropy loss coefficient
--use_lstm Use a recurrent policy
--clean_data Will remove all model files in src/data

Citation

To cite our paper please use

@Article{tennenholtz2022reinforcement,
  title={Reinforcement Learning with a Terminator},
  author={Tennenholtz, Guy and Merlis, Nadav and Shani, Lior and Mannor, Shie and Shalit, Uri and Chechik, Gal and Hallak, Assaf and Dalal, Gal},
  journal={arXiv preprint arXiv:2205.15376},
  year={2022}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages