Optimistic Reinforcement Learning Task and Motion Planning (ORL-TAMP) is a framework integrating an RL policy into TAMP pipelines. The general idea is to encapsulate an RL policy into a so-called skill. A skill comprises an RL policy, a state discriminator, and a sub-goal generator. Besides steering the action, the RL policy, state discriminator, and sub-goal generator are used to verify symbolic predicates and ground geometric values.
The method introduction and experiments:
The current version is tested on Ubuntu 20.04
-
Dependencies:
-
MoveIt (ROS Noetic)
We are currently trying to remove the dependency of MoveIt due to its inflexibility and ROS specificity.
-
-
Build PDDL FastDownward solver:
orl_tamp$ ./downward/build.py
-
Compile IK solver:
orl_tamp$ cd utils/pybullet_tools/ikfast/franka_panda/ franka_panda$ python setup.py
-
Download the RL policy models: Retrieve and EdgePush, and save policies in the
/orl_tamp/policies
folder. -
Run MoveIt (following the tutorial)
-
Run demos:
- Retrieve:
orl_tamp$ ./run_demo.sh retrieve
- EdgePush:
orl_tamp$ ./run_demo.sh edgepush
- Rearange:
orl_tamp$ ./run_demo.sh rearrange
- Retrieve:
This section we give general steps about to train your own skills.
- Modify the PDDL domain file and and stream file, add the PDDL definations of the skills.
- Use StableBaselines3 to standardized the policy trainning.
- Generate dataset in the domain scenario.
- Train the state discriminator.