pytorch-rl-lab

DDPG and MPC implementation of Group 06 Frederik Wegner and Alexander Lind

Installation Guide

This guide assumes you are working under Ubuntu 16.04

Make sure you have Python >= 3.5.3 on your system. If that is not the case, install Python3.6

 sudo add-apt-repository ppa:deadsnakes/ppa
 sudo apt-get update
 sudo apt-get install python3.6
 sudo apt-get install python3.6-venv

Clone this repository into some folder:

 git clone [email protected]:al91liwo/pytorch-rl-lab.git
     or
 git clone https://github.com/al91liwo/pytorch-rl-lab.git

Create a virtual environment, activate it, and update it. You can also use an Anaconda virtual environment.
```
 python3.6 -m venv venv3
 source venv3/bin/activate
 pip3 install -U pip setuptools
```
Install the requirements.
```
 pip3 install -r requirements.txt
```
Check that everything works correctly by running the code snippet from the example quanser_environment and pytorch-rl-lab example.

Getting started

You can always use the command line to start either a training or trial session with a given algorithm.

python main.py -h

With python main.py you can specify a algorithm to use. Right now you can use ddpg or mpc

positional arguments:
algorithm   algorithm specified in src/algorithm/
{rr,sim}    choose between simulation or real environment mode

After you've chosen your algorithm, you can either run a session in simulation or real environment mode.

python main.py ddpg sim -h

Either in simulation or real environment mode you can choose between train or trial mode.

positional arguments:
{train,trial}  choose between train or trial
train        train mode in simulated environment
trial        trial mode in simulated environment

In train mode you have always to choose a parameters.csv file and a output directory.

python main.py ddpg sim train -h

You can have a look at the parameters.csv example and a common output directory

positional arguments:
hyperparameters  .csv folder with hyperparameters for specified algorithm
outdir           output directory of your training data

In trial mode you have always to choose a folder containing a parameters.csv and a policy and the number of episodes to run your policy.

python main.py ddpg sim trial -h

You can try the trial example

positional arguments:
policy      path to your policy
outdir      save your results in specified directory
episodes    number of episodes to start your trial in sim mode

Example

For example you can train the algorithm DDPG with given hyperparameters as a .csv file. For example parameters.csv

run_id	env	steps	batch_size	buffer_size	warmup_samples	actor_lr	critic_lr	actor_hidden_layers	critic_hidden_layers	tau	noise_decay	lr_decay	lr_min	batch_norm	trial_horizon	action_space_limits	dirname
CartpoleTrial	CartpoleStabShort-v0	300000	64	1000000	20000	0.001	0.01	[100, 100, 50]	[100, 100]	0.01	0.99	1.0	1e-08	False	5000	([-5.0], [5.0])	out/CartpoleTrial_CartpoleStabShort-v0

Execute this command to obtain results:

python main.py ddpg sim train parameters.csv out

out specifies the directory where the output result will be saved (this is strictly specified by the developer) for more information take a look at config readme

train the command to train the specified algorithm under given hyperparameters (the parameters.csv) file

Your output should be something like this:

And the given plot in your specified outdir:

To trial your models you can choose a model in the outdir that fits your needs. The model names that you need are always called actortarget with some numbers that represent the obtained reward in a training session.

We choose the model that gained approx 10.000 reward and take the parameters.csv to a new folder called test_model and safe the policy as policy and take the specified parameters.csv into test_model.

Now we can execute the model and obtain our results graphically.

python main.py ddpg sim trial test_model result 100

your reward plot for your policy will look like this:

You can see that the approximate reward is 10.000

and the obtained policy looks like this (we only let the policy render once):

real environment

If you want to execute this example on the real environment just train on the real environment

    python main.py ddpg sim train parameters.csv out

Put your policy and parameters.csv in a new directory for example test_model

    python main.py ddpg sim trial test_model result 1

After executing this command u will see similar results:

Have fun testing parameters and writing your own algorithms

Troubleshooting

If you have problems with training or trial sessions just make sure your output folders are empty and you always name the hyperparameters file as parameters.csv and the policy as policy.

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
clients @ d3efb41		clients @ d3efb41
run		run
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
dependencies.yaml		dependencies.yaml
main.py		main.py
parameters.csv		parameters.csv
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pytorch-rl-lab

Installation Guide

Getting started

Example

real environment

Troubleshooting

About

Releases

Packages

Contributors 3

Languages

al91liwo/pytorch-rl-lab

Folders and files

Latest commit

History

Repository files navigation

pytorch-rl-lab

Installation Guide

Getting started

Example

real environment

Troubleshooting

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages