-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added updates for HM3D ObjectNav training #818
base: main
Are you sure you want to change the base?
Conversation
def get_observation( | ||
self, *args: Any, observations, episode, **kwargs: Any | ||
): | ||
episode_uniq_id = f"{episode.scene_id} {episode.episode_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic sensor oservation here could be a Tensor right, which would create an unnecessary CPU <-> GPU. I think you can easily modify this to operator on both NpArrays and Torch.Tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is caching.
@srama2512 This needs testing in the test suite like from so: habitat-lab/test/test_sensors.py Line 443 in 3f2c7cb
|
if "semantic_category" in observation_space.spaces: | ||
self._n_input_semantic_category = observation_space.spaces["semantic_category"].shape[2] | ||
spatial_size = observation_space.spaces["semantic_category"].shape[0] // 2 | ||
assert self._n_input_semantic_category == 3, "ResNetEncoder only supports RGB values from SemanticCategory sensor!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why semantic_category
why it has RGB values, not int id of the category?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the semantic category sensor outputs an RGB image by default. For each goal category, there is a unique color. For all other categories and backgrounds, the pixels are set to zeros.
if normalize_visual_inputs: | ||
self.running_mean_and_var: nn.Module = RunningMeanAndVar( | ||
self._n_input_depth + self._n_input_rgb | ||
self._n_input_depth + self._n_input_rgb + self._n_input_semantic_category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normalizing semantic_category inputs doesn't look right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic_category input is forced to be RGB (see earlier comment).
@@ -66,6 +66,7 @@ class PPOTrainer(BaseRLTrainer): | |||
supported_tasks = ["Nav-v0"] | |||
|
|||
SHORT_ROLLOUT_THRESHOLD: float = 0.25 | |||
SENSORS_BLACKLIST = ["semantic"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks hack to do it in PPOTrainer. Why we need to disable this sensor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the semantic
value throws errors in various parts of the code (observation transforms, observations rollouts, etc). You could try removing it from the blacklist and checking. This seemed the more elegant solution, but I'm happy to implement a better solution based on feedback.
@@ -400,6 +406,15 @@ def _extract_scalars_from_info( | |||
|
|||
return result | |||
|
|||
def _clean_observations(self, observations): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to do it different way.
@@ -1058,16 +1076,21 @@ def _eval_checkpoint( | |||
current_episodes[i].episode_id, | |||
) | |||
] = episode_stats | |||
goal_name = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's too specific and should live in observations_to_image
as it has access to current_episodes[i].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to add the goal name to the video save path (so that we can visualize episodes based on the goal). How do you suggest we go about this? Can we pass the current_episodes[i] to generate_video and move the naming logic there?
episode_id=current_episodes[i].episode_id, | ||
checkpoint_idx=checkpoint_index, | ||
metrics=self._extract_scalars_from_info(infos[i]), | ||
tb_writer=writer, | ||
goal_name=goal_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In high level there should be no task specific logic in this file.
cat_mapping = HM3D_CATEGORY_TO_TASK_CATEGORY_ID | ||
self.category_to_task_category_id = cat_mapping | ||
if config.RAW_NAME_TO_CATEGORY_MAPPING != "": | ||
with open(config.RAW_NAME_TO_CATEGORY_MAPPING, "r") as fp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave this code it should use some csv reader or etc.
@@ -514,6 +521,118 @@ def get_observation( | |||
) | |||
|
|||
|
|||
@registry.register_sensor(name="SemanticCategorySensor") | |||
class SemanticCategorySensor(Sensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use this sensor for training or only for visualization purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sensor is used for training models with GT semantic goal inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srama2512, yes. But that GT data isn't available during evaluation and from this PR isn't clear how it will be replaced.
# Map from instance id to task id | ||
semantic_category = np.take(self.instance_id_to_task_id, semantic) | ||
if self.config.CONVERT_TO_RGB: | ||
semantic_category = self.convert_semantic_to_rgb(semantic_category) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RGB conversion should live in def observations_to_image(observation: Dict, info: Dict) -> np.ndarray:
.
Something like this:
if "semantic_category" in observation:
flat_sem = observation[
obj_semantic_name
] # to move to same scale #.permute(2, 0, 1).unsqueeze(0).data.max(1)[1].cpu().numpy()[0]
flat_sem[flat_sem == 41] = 40
if not isinstance(flat_sem, np.ndarray):
flat_sem = flat_sem.cpu().numpy()
semantic_segmentation = (
color_label(flat_sem).squeeze().transpose(1, 2, 0).astype(np.uint8)
)
egocentric_view.append(semantic_segmentation)
if "objectgoal" in observation and "episode_info" in info:
from habitat.tasks.nav.object_nav_task import task_cat2mpcat40
# permute tensor to dimension [CHANNEL x HEIGHT X WIDTH]
idx = task_cat2mpcat40[observation["objectgoal"][0]]
goal_segmentation = (
color_label(flat_sem == idx)
.squeeze()
.transpose(1, 2, 0)
.astype(np.uint8)
)
egocentric_view.append(goal_segmentation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sensor is not for visualization purposes. It is intended to be a model input for training with oracle semantics. So the logic should live within the sensor itself, right?
# LICENSE file in the root directory of this source tree. | ||
|
||
|
||
GIBSON_CATEGORY_TO_TASK_CATEGORY_ID = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be somewhere: habitat/datasets/object_nav/...
.
Co-authored-by: Aaron Gokaslan <[email protected]>
} | ||
|
||
|
||
MP3D_CATEGORY_TO_TASK_CATEGORY_ID = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different than https://github.com/niessner/Matterport/blob/master/metadata/mpcat40.tsv which will likely lead to confusion.
|
||
|
||
GIBSON_CATEGORY_TO_TASK_CATEGORY_ID = { | ||
'chair': 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on "Set invalid instance IDs to unknown object 0" you use 0
as unknown elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's still not clear why we need Semantic_category represent as RGB for training. Non-intuitive and can lead to more issues when categories are cross dataset.
As it uses GT Semantic information it's not clear how this PR is used to train final baseline.
@srama2512, @devendrachaplot how we want evolve this PR? Do we more up to date code focused on ObjectNav |
Motivation and Context
This PR adds necessary configs for HM3D ObjectNav training. It includes new features for training with a
SemanticCategory
sensor. It also updates the generate_video function to include scene_id, goal_name in the save path.How Has This Been Tested
Tested locally.
Types of changes
Checklist