Soft Q-learning (SQL) is a deep reinforcement learning framework for training maximum entropy policies in continuous domains. The algorithm is based on the paper Reinforcement Learning with Deep Energy-Based Policies presented at the International Conference on Machine Learning (ICML), 2017.
Soft Q-learning can be run either locally or through Docker.
You will need to have Docker and Docker Compose installed unless you want to run the environment locally.
Most of the models require a MuJoCo license.
Currently, rendering of simulations is not supported on Docker due to a missing display setup. As a fix, you can use a local installation. If you want to run the MuJoCo environments without rendering, the docker environment needs to know where to find your MuJoCo license key (mjkey.txt
). You can either copy your key into <PATH_TO_THIS_REPOSITY>/.mujoco/mjkey.txt
, or you can specify the path to the key in your environment variables:
export MUJOCO_LICENSE_PATH=<path_to_mujoco>/mjkey.txt
Please note that you should not use relative paths such as ~/mujoco/mjkey.txt
in the environment variable, as Docker compose does not expand them. You could use "$HOME/.mujoco/mjkey.txt"
instead.
Once that's done, you can run the Docker container with
docker-compose up
Docker compose creates a Docker container named soft-q-learning
and automatically sets the needed environment variables and volumes.
You can access the container with the typical Docker exec-command, i.e.
docker exec -it soft-q-learning bash
See examples section for examples of how to train and simulate the agents.
To clean up the setup:
docker-compose down
To get the environment installed correctly, you will first need to clone rllab, and have its path added to your PYTHONPATH environment variable.
- Clone rllab
cd <installation_path_of_your_choice>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}
- Download and copy MuJoCo files to rllab path:
If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the
.dylib
files instead of.so
files.
mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 <installation_path_of_your_choice>/rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp
- Copy your MuJoCo license key (mjkey.txt) to rllab path:
cp <mujoco_key_folder>/mjkey.txt <installation_path_of_your_choice>/rllab/vendor/mujoco
- Clone
softqlearning
cd <installation_path_of_your_choice>
git clone https://github.com/haarnoja/softqlearning.git
- Create and activate conda environment
cd softqlearning
conda env create -f environment.yml
source activate sql
The environment should be ready to run. See examples section for examples of how to train and simulate the agents.
Finally, to deactivate and remove the conda environment:
source deactivate
conda remove --name sql --all
- To train the agent
python ./examples/mujoco_all_sql.py --env=swimmer --log_dir="/root/sql/data/swimmer-experiment"
- To simulate the agent (NOTE: This step currently fails with the Docker installation, due to missing display.)
python ./scripts/sim_policy.py /root/sql/data/swimmer-experiment/itr_<iteration>.pkl
mujoco_all_sql.py
contains several different environments and there are more example scripts available in the /examples
folder. For more information about the agents and configurations, run the scripts with --help
flag. For example:
python ./examples/mujoco_all_sql.py --help
usage: mujoco_all_sql.py [-h]
[--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
[--exp_name EXP_NAME] [--mode MODE]
[--log_dir LOG_DIR]
It is also possible to merge two existing maximum entropy policies to form a new composed skill that approximately optimizes both constituent tasks simultaneously as discussed in Composable Deep Reinforcement Learning for Robotic Manipulation. To run the pusher experiment described in the paper, you can first train two policies for the constituent tasks ("push the object to the given x-coordinate" and "push the object to the given y-coordinate") by running
python ./examples/pusher_pretrain.py --log_dir=/root/sql/data/pusher
You can then combine the two policies to form a combined skill ("push the object to the given x and y coordinates"), without collecting more experience form the environment, with
python ./examples/pusher_combine.py --log_dir=/root/sql/data/pusher/combined \
--snapshot1=/root/sql/data/pusher/00/params.pkl \
--snapshot2=/root/sql/data/pusher/01/params.pkl
The soft q-learning algorithm was developed by Haoran Tang and Tuomas Haarnoja under the supervision of Prof. Sergey Levine and Prof. Pieter Abbeel at UC Berkeley. Special thanks to Vitchyr Pong, who wrote some parts of the code, and Kristian Hartikainen who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by Berkeley Deep Drive.
@article{haarnoja2017reinforcement,
title={Reinforcement Learning with Deep Energy-Based Policies},
author={Haarnoja, Tuomas and Tang, Haoran and Abbeel, Pieter and Levine, Sergey},
booktitle={International Conference on Machine Learning},
year={2017}
}
@article{haarnoja2018composable,
title={Composable Deep Reinforcement Learning for Robotic Manipulation},
author={Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine},
booktitle={International Conference on Robotics and Automation},
year={2018}
}