Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added updates for HM3D ObjectNav training #818

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

srama2512
Copy link
Contributor

@srama2512 srama2512 commented Feb 28, 2022

Motivation and Context

This PR adds necessary configs for HM3D ObjectNav training. It includes new features for training with a SemanticCategory sensor. It also updates the generate_video function to include scene_id, goal_name in the save path.

How Has This Been Tested

Tested locally.

Types of changes

  • Docs change / refactoring / dependency upgrade
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have completed my CLA (see CONTRIBUTING)
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Feb 28, 2022
def get_observation(
self, *args: Any, observations, episode, **kwargs: Any
):
episode_uniq_id = f"{episode.scene_id} {episode.episode_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic sensor oservation here could be a Tensor right, which would create an unnecessary CPU <-> GPU. I think you can easily modify this to operator on both NpArrays and Torch.Tensor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is caching.

@Skylion007
Copy link
Contributor

@srama2512 This needs testing in the test suite like from so:

"sensor", ["RGB_SENSOR", "DEPTH_SENSOR", "SEMANTIC_SENSOR"]

if "semantic_category" in observation_space.spaces:
self._n_input_semantic_category = observation_space.spaces["semantic_category"].shape[2]
spatial_size = observation_space.spaces["semantic_category"].shape[0] // 2
assert self._n_input_semantic_category == 3, "ResNetEncoder only supports RGB values from SemanticCategory sensor!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why semantic_category why it has RGB values, not int id of the category?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the semantic category sensor outputs an RGB image by default. For each goal category, there is a unique color. For all other categories and backgrounds, the pixels are set to zeros.

if normalize_visual_inputs:
self.running_mean_and_var: nn.Module = RunningMeanAndVar(
self._n_input_depth + self._n_input_rgb
self._n_input_depth + self._n_input_rgb + self._n_input_semantic_category
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizing semantic_category inputs doesn't look right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic_category input is forced to be RGB (see earlier comment).

@@ -66,6 +66,7 @@ class PPOTrainer(BaseRLTrainer):
supported_tasks = ["Nav-v0"]

SHORT_ROLLOUT_THRESHOLD: float = 0.25
SENSORS_BLACKLIST = ["semantic"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks hack to do it in PPOTrainer. Why we need to disable this sensor here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the semantic value throws errors in various parts of the code (observation transforms, observations rollouts, etc). You could try removing it from the blacklist and checking. This seemed the more elegant solution, but I'm happy to implement a better solution based on feedback.

@@ -400,6 +406,15 @@ def _extract_scalars_from_info(

return result

def _clean_observations(self, observations):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to do it different way.

@@ -1058,16 +1076,21 @@ def _eval_checkpoint(
current_episodes[i].episode_id,
)
] = episode_stats
goal_name = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's too specific and should live in observations_to_image as it has access to current_episodes[i].

Copy link
Contributor Author

@srama2512 srama2512 Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to add the goal name to the video save path (so that we can visualize episodes based on the goal). How do you suggest we go about this? Can we pass the current_episodes[i] to generate_video and move the naming logic there?

episode_id=current_episodes[i].episode_id,
checkpoint_idx=checkpoint_index,
metrics=self._extract_scalars_from_info(infos[i]),
tb_writer=writer,
goal_name=goal_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In high level there should be no task specific logic in this file.

cat_mapping = HM3D_CATEGORY_TO_TASK_CATEGORY_ID
self.category_to_task_category_id = cat_mapping
if config.RAW_NAME_TO_CATEGORY_MAPPING != "":
with open(config.RAW_NAME_TO_CATEGORY_MAPPING, "r") as fp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave this code it should use some csv reader or etc.

@@ -514,6 +521,118 @@ def get_observation(
)


@registry.register_sensor(name="SemanticCategorySensor")
class SemanticCategorySensor(Sensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this sensor for training or only for visualization purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sensor is used for training models with GT semantic goal inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srama2512, yes. But that GT data isn't available during evaluation and from this PR isn't clear how it will be replaced.

# Map from instance id to task id
semantic_category = np.take(self.instance_id_to_task_id, semantic)
if self.config.CONVERT_TO_RGB:
semantic_category = self.convert_semantic_to_rgb(semantic_category)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RGB conversion should live in def observations_to_image(observation: Dict, info: Dict) -> np.ndarray:.
Something like this:

    if "semantic_category" in observation:
        flat_sem = observation[
            obj_semantic_name
        ]  # to move to same scale #.permute(2, 0, 1).unsqueeze(0).data.max(1)[1].cpu().numpy()[0]
        flat_sem[flat_sem == 41] = 40
        if not isinstance(flat_sem, np.ndarray):
            flat_sem = flat_sem.cpu().numpy()
        semantic_segmentation = (
            color_label(flat_sem).squeeze().transpose(1, 2, 0).astype(np.uint8)
        )
        egocentric_view.append(semantic_segmentation)

        if "objectgoal" in observation and "episode_info" in info:
            from habitat.tasks.nav.object_nav_task import task_cat2mpcat40

            # permute tensor to dimension [CHANNEL x HEIGHT X WIDTH]
            idx = task_cat2mpcat40[observation["objectgoal"][0]]
            goal_segmentation = (
                color_label(flat_sem == idx)
                .squeeze()
                .transpose(1, 2, 0)
                .astype(np.uint8)
            )
            egocentric_view.append(goal_segmentation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sensor is not for visualization purposes. It is intended to be a model input for training with oracle semantics. So the logic should live within the sensor itself, right?

# LICENSE file in the root directory of this source tree.


GIBSON_CATEGORY_TO_TASK_CATEGORY_ID = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be somewhere: habitat/datasets/object_nav/....

Co-authored-by: Aaron Gokaslan <[email protected]>
}


MP3D_CATEGORY_TO_TASK_CATEGORY_ID = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different than https://github.com/niessner/Matterport/blob/master/metadata/mpcat40.tsv which will likely lead to confusion.



GIBSON_CATEGORY_TO_TASK_CATEGORY_ID = {
'chair': 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on "Set invalid instance IDs to unknown object 0" you use 0 as unknown elsewhere.

Copy link
Contributor

@mathfac mathfac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's still not clear why we need Semantic_category represent as RGB for training. Non-intuitive and can lead to more issues when categories are cross dataset.
As it uses GT Semantic information it's not clear how this PR is used to train final baseline.

@mathfac
Copy link
Contributor

mathfac commented Aug 30, 2022

@srama2512, @devendrachaplot how we want evolve this PR? Do we more up to date code focused on ObjectNav

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants