deepfield2023

Sec.1 PPO tutorial

Practice https://colab.research.google.com/drive/1-2IUh717LBaZadyNRWJgrNp8nggYUShu?usp=sharing
change to GPU device

[top right dropdown] connect to a hosted runtime
[RAM Disk label] check if you are using GPU backend
if not, check [Change runtime type], and change Hardware accelerator to GPU

Answer https://colab.research.google.com/drive/1NPeIGPo-XkFSGTfTQyQNOjlWI7CjRmJD?usp=sharing

Sec.2 PPO Turtlesim Installation Instructions

clone this repository

cd ~
git clone https://github.com/Ootang2019/deepfield2023.git

install dependencies (python 3.7+, PyTorch>=1.11)

pip install stable-baselines3 pyyaml rospkg numpy

start simulation

terminal1:

cd ~/deepfield2023
roscore

terminal2:

cd ~/deepfield2023
roslaunch multisim_turtle.launch

terminal3:

cd ~/deepfield2023
python rl.py

clean simulation

ctrl+c in all terminal
clean ros artifact

cd ~/deepfield2023
bash cleanup.sh

Task: Can you make training faster? Possible directions:

Reward Engineering:

add different penalty terms to the reward function: in turtle_sim.py, modify reward weight self.rew_w array and compute_reward() function
clip the reward to the range [-1,1] to reduce the reward variance
increase reward weight, self.rew_w

Add a penalty for hitting the wall

wallposition: x=0, y=0, x=11.1, y=11.1
include detection to the observe() function
add a penalty in compute_reward() and reward weight self.rew_w

Hyper-parameter Tuning:

in rl.py, modify PPO hyper-parameters or NN architecture

Residual RL:

add a baseline PID controller to the environment
mix PID and RL command to control turtle

Curriculum learning:

make the goal easier to solve from the beginning, and then progressively make the task harder

Improve exploration:

add exploration bonus to the reward to encourage agent discovering new states

Try different agent:

in rl.py, import agents and replace PPO

from stable_baselines3 import DDPG, SAC, TD3

Customize PPO:

create your own PPO from the code in Colab notebook to have maximum control over the training loop

Try Harder Env (dynamic goal):

in rl.py, replace TurtleEnv with TurtleEnv_hard

Action Smoothness:

incorporate an accumulative action space
penalize action changes

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
img		img
.gitignore		.gitignore
LICENSE		LICENSE
PPO tutorial and its application in Robotics.pdf		PPO tutorial and its application in Robotics.pdf
README.md		README.md
__init__.py		__init__.py
cleanup.sh		cleanup.sh
multisim_turtle.launch		multisim_turtle.launch
rl.py		rl.py
turtle_sim.py		turtle_sim.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

deepfield2023

Sec.1 PPO tutorial

Sec.2 PPO Turtlesim Installation Instructions

Task: Can you make training faster? Possible directions:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

robot-perception-group/DeepField2023

Folders and files

Latest commit

History

Repository files navigation

deepfield2023

Sec.1 PPO tutorial

Sec.2 PPO Turtlesim Installation Instructions

Task: Can you make training faster? Possible directions:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages