-
Practice https://colab.research.google.com/drive/1-2IUh717LBaZadyNRWJgrNp8nggYUShu?usp=sharing
-
change to GPU device
- [top right dropdown] connect to a hosted runtime
- [RAM Disk label] check if you are using GPU backend
- if not, check [Change runtime type], and change Hardware accelerator to GPU
- clone this repository
cd ~
git clone https://github.com/Ootang2019/deepfield2023.git
- install dependencies (python 3.7+, PyTorch>=1.11)
pip install stable-baselines3 pyyaml rospkg numpy
- start simulation
- terminal1:
cd ~/deepfield2023
roscore
- terminal2:
cd ~/deepfield2023
roslaunch multisim_turtle.launch
- terminal3:
cd ~/deepfield2023
python rl.py
- clean simulation
- ctrl+c in all terminal
- clean ros artifact
cd ~/deepfield2023
bash cleanup.sh
- Reward Engineering:
- add different penalty terms to the reward function: in turtle_sim.py, modify reward weight self.rew_w array and compute_reward() function
- clip the reward to the range [-1,1] to reduce the reward variance
- increase reward weight, self.rew_w
- Add a penalty for hitting the wall
- wallposition: x=0, y=0, x=11.1, y=11.1
- include detection to the observe() function
- add a penalty in compute_reward() and reward weight self.rew_w
- Hyper-parameter Tuning:
- in rl.py, modify PPO hyper-parameters or NN architecture
- Residual RL:
- add a baseline PID controller to the environment
- mix PID and RL command to control turtle
- Curriculum learning:
- make the goal easier to solve from the beginning, and then progressively make the task harder
- Improve exploration:
- add exploration bonus to the reward to encourage agent discovering new states
- Try different agent:
- in rl.py, import agents and replace PPO
from stable_baselines3 import DDPG, SAC, TD3
- Customize PPO:
- create your own PPO from the code in Colab notebook to have maximum control over the training loop
- Try Harder Env (dynamic goal):
- in rl.py, replace TurtleEnv with TurtleEnv_hard
- Action Smoothness:
- incorporate an accumulative action space
- penalize action changes