DDPG and MPC implementation of Group 06 Frederik Wegner and Alexander Lind
This guide assumes you are working under Ubuntu 16.04
-
Make sure you have Python >= 3.5.3 on your system. If that is not the case, install Python3.6
sudo add-apt-repository ppa:deadsnakes/ppa sudo apt-get update sudo apt-get install python3.6 sudo apt-get install python3.6-venv
-
Clone this repository into some folder:
git clone [email protected]:al91liwo/pytorch-rl-lab.git or git clone https://github.com/al91liwo/pytorch-rl-lab.git
-
Create a virtual environment, activate it, and update it. You can also use an Anaconda virtual environment.
python3.6 -m venv venv3 source venv3/bin/activate pip3 install -U pip setuptools
-
Install the requirements.
pip3 install -r requirements.txt
-
Check that everything works correctly by running the code snippet from the example quanser_environment and pytorch-rl-lab example.
You can always use the command line to start either a training or trial session with a given algorithm.
python main.py -h
With python main.py you can specify a algorithm to use. Right now you can use ddpg or mpc
positional arguments:
algorithm algorithm specified in src/algorithm/
{rr,sim} choose between simulation or real environment mode
After you've chosen your algorithm, you can either run a session in simulation or real environment mode.
python main.py ddpg sim -h
Either in simulation or real environment mode you can choose between train or trial mode.
positional arguments:
{train,trial} choose between train or trial
train train mode in simulated environment
trial trial mode in simulated environment
In train mode you have always to choose a parameters.csv file and a output directory.
python main.py ddpg sim train -h
You can have a look at the parameters.csv example and a common output directory
positional arguments:
hyperparameters .csv folder with hyperparameters for specified algorithm
outdir output directory of your training data
In trial mode you have always to choose a folder containing a parameters.csv and a policy and the number of episodes to run your policy.
python main.py ddpg sim trial -h
You can try the trial example
positional arguments:
policy path to your policy
outdir save your results in specified directory
episodes number of episodes to start your trial in sim mode
For example you can train the algorithm DDPG with given hyperparameters as a .csv file. For example parameters.csv
run_id | env | steps | batch_size | buffer_size | warmup_samples | actor_lr | critic_lr | actor_hidden_layers | critic_hidden_layers | tau | noise_decay | lr_decay | lr_min | batch_norm | trial_horizon | action_space_limits | dirname |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CartpoleTrial | CartpoleStabShort-v0 | 300000 | 64 | 1000000 | 20000 | 0.001 | 0.01 | [100, 100, 50] | [100, 100] | 0.01 | 0.99 | 1.0 | 1e-08 | False | 5000 | ([-5.0], [5.0]) | out/CartpoleTrial_CartpoleStabShort-v0 |
Execute this command to obtain results:
python main.py ddpg sim train parameters.csv out
out
specifies the directory where the output result will be saved (this is strictly specified by the developer) for more information take a look at config readme
train
the command to train the specified algorithm under given hyperparameters (the parameters.csv) file
Your output should be something like this:
And the given plot in your specified outdir
:
To trial your models you can choose a model in the outdir
that fits your needs.
The model names that you need are always called actortarget
with some numbers that represent the obtained reward
in a training session.
We choose the model that gained approx 10.000 reward and take the parameters.csv to a new folder called test_model
and safe
the policy as policy
and take the specified parameters.csv
into test_model
.
Now we can execute the model and obtain our results graphically.
python main.py ddpg sim trial test_model result 100
your reward plot for your policy will look like this:
You can see that the approximate reward
is 10.000
and the obtained policy looks like this (we only let the policy render once):
If you want to execute this example on the real environment just train on the real environment
python main.py ddpg sim train parameters.csv out
Put your policy
and parameters.csv
in a new directory for example test_model
python main.py ddpg sim trial test_model result 1
After executing this command u will see similar results:
Have fun testing parameters and writing your own algorithms
If you have problems with training or trial sessions just make sure your output folders are empty and you always name the hyperparameters file as parameters.csv
and the policy as policy
.