Understanding and Adopting Rational Behavior by Bellman Score Estimation

Author: Kuno Kim

This repo contains the official implementation for the ICLR submission

Note: ** This repo is being actively updated. **

Main Dependencies

PyTorch
cudatoolkit
PyYAML
hydra
dm_control

Requirements

We assume you have access to a gpu that can run CUDA 9.2 or above. We used pytorch==1.10.1 with cudatoolkit==11.3.1 in our experiments. The simplest way to install all required dependencies is to create an anaconda environment and activate it:

conda env create -f conda_env.yaml
source activate gac

Next, install the ReparamModule package following the instructions from here.

Running Experiments

Project Structure

train_gac.py is the common gateway to all experiments.

usage: train_gac.py env=ENV_NAME
                    experiment=EXP_NAME
                    seed=SEED
                    load_demo_path=PATH_TO_DEMO
                    load_expert_path=PATH_TO_EXPERT
                    num_transitions=NUM_DEMO

optional arguments:
  experiment          Name of experiment for logging purposes
  seed                Random seed
  load_demo_path      Global path to the saved expert demonstrations
  load_expert_path    Global path to the saved expert (only for evaluation purposes)
  num_transitions     Number of demonstrations to use

Configuration files are stored in config/. For example, the configuration file of GAC is config/imitate.yaml and config/agent/gac.yaml. Log files are commonly stored in exp/ including the tensorboard files.

Training

Download the expert demonstations and place them in gac/saved_demo. Each pickle file contains 1000 demonstration trajectories for a different environment. The environment names match the file names. The usage of train_gac.py is quite self-evident. For example, we can train GAC for the walker_walk task with one demonstration by running

python train_gac.py env='walker_walk' experiment='walker_walk' seed=0 load_demo_path=/user/gac/saved_demo/walker_walk.pickle load_expert_path=/user/gac/saved_experts/walker_walk.pt num_transitions=1

Choose from a variety of environments walker_stand, walker_walk, hopper_stand, cheetah_run, quadruped_run.

Evaluation

Running train_gac.py outputs evaluation metrics to the console. The long names for the shorthand acroynms can be found in logger.py. For the evaluation step outputs, L_R shows the average learner episode reward which quantifies control performance of the learner. Another convenient way to monitor training progress is to use tensorboard. For example, to visualize the runs started on 2022.10.01, one may run

tensorboard --logdir exp/2022.10.01 --port 8008

The evaluation metrics are then found at http://localhost:8008. The "learner_episode_reward" graph shows the average episode reward obtained during the evaluation step. A sample learning curve for the walker_walk task should look like so.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
agent		agent
config		config
figures		figures
saved_experts		saved_experts
saved_rewards		saved_rewards
README.md		README.md
conda_env.yaml		conda_env.yaml
demonstrations.py		demonstrations.py
imitation.py		imitation.py
logger.py		logger.py
record.log		record.log
train_gac.py		train_gac.py
utils.py		utils.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Understanding and Adopting Rational Behavior by Bellman Score Estimation

Main Dependencies

Requirements

Running Experiments

Project Structure

Training

Evaluation

About

Releases

Packages

Contributors 2

Languages

ermongroup/gac

Folders and files

Latest commit

History

Repository files navigation

Understanding and Adopting Rational Behavior by Bellman Score Estimation

Main Dependencies

Requirements

Running Experiments

Project Structure

Training

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages