Vision-based Chain-of-Thought Predictive Control
Currently the code includes four tasks for state-based imitation learning:
PickCube-v0
, StackCube-v0
, PegInsertionSide-v0
and TurnFaucet-v0
.
The state-based demo trajectories used in the CoTPC paper are stored in this Googld Drive folder.
Each folder has a *.h5
file (for actual trajectories) and a *.json
file (for metadata regarding the trajectories).
Each task has approximately 1000 trajectories and each comes with a different env variation (i.e., env seed).
These demos are generated by replaying the official demos provided by ManiSkill2 using replay_trajectory.py
with control_mode="pd_joint_delta_pos"
, obs_mode="state"
, and several patches to the ManiSkill2 code (see maniskill2_patches
).
Specifically, we add additional flags to the tasks so that the key states (the Chain-of-Thought) can be obtained with priviledged information from the simulator.
For the task TurnFaucet-v0
, we use a subset of 10 faucet models for the demos (see scripts/replay_turn_faucet_trajectories.sh
).
If you want to generate visual-based demos, please refer to the official ManiSkill2 repo.
The data loader samples (contiguous) subarrays of demo trajectories by specifying the min and max sample lengths.
In CoTPC, we simply use a fixed value for both min and max (e.g., set the context size as 60 for all tasks).
With a fixed random seed seed
and num_traj
, each time the data loader samples a fixed subset of all trajectories.
In all experiments from the paper, we use seed=0
and num_traj=500
.
For TurnFaucet, due to the variations of the different faucet models, the loader performs sampling such that the number of trajs
per faucet model is the same (hopefully this balanced data ease model training).
The key states (a.k.a the Chain-of-Thought) can be obtained with the function get_key_states(...)
that accesses privileged info available during training.
The patches to the environments in ManiSkill2 as described previously also provide additional evaluation metrics (intermediate success rate) for trajectories. To evaluate a model using the same set of env seeds as the ones used in demos (or using a specific set of env seeds), please refer to this official doc.
Please see scripts/train.sh
as examples.# Vision-based-COTPC