Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added updates for HM3D ObjectNav training #818

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,7 @@ Download the Habitat related Gibson dataset following the instructions [here](ht
| [Point goal navigation](https://arxiv.org/abs/1807.06757) | MatterPort3D | [pointnav_mp3d_v1.zip](https://dl.fbaipublicfiles.com/habitat/data/datasets/pointnav/mp3d/v1/pointnav_mp3d_v1.zip) | `data/datasets/pointnav/mp3d/v1/` | [`datasets/pointnav/mp3d.yaml`](configs/datasets/pointnav/mp3d.yaml) | 400 MB |
| 🆕[Point goal navigation](https://arxiv.org/abs/1807.06757) | HM3D | [pointnav_hm3d_v1.zip](https://dl.fbaipublicfiles.com/habitat/data/datasets/pointnav/hm3d/v1/pointnav_hm3d_v1.zip) | `data/datasets/pointnav/hm3d/v1/` | [`datasets/pointnav/hm3d.yaml`](configs/datasets/pointnav/hm3d.yaml) | 992 MB |
| [Object goal navigation](https://arxiv.org/abs/2006.13171) | MatterPort3D | [objectnav_mp3d_v1.zip](https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/m3d/v1/objectnav_mp3d_v1.zip) | `data/datasets/objectnav/mp3d/v1/` | [`datasets/objectnav/mp3d.yaml`](configs/datasets/objectnav/mp3d.yaml) | 170 MB |
| 🆕[Object goal navigation](https://arxiv.org/abs/2006.13171) | HM3D | [objectnav_hm3d_v1.zip](https://dl.fbaipublicfiles.com/habitat/data/datasets/objectnav/hm3d/v1/objectnav_hm3d_v1.zip) | `data/datasets/objectnav/hm3d/v1/` | [`datasets/objectnav/hm3d.yaml`](configs/datasets/objetnav/hm3d.yaml) | 154 MB |
| [Embodied Question Answering](https://embodiedqa.org/) | MatterPort3D | [eqa_mp3d_v1.zip](https://dl.fbaipublicfiles.com/habitat/data/datasets/eqa/mp3d/v1/eqa_mp3d_v1.zip) | `data/datasets/eqa/mp3d/v1/` | [`datasets/eqa/mp3d.yaml`](configs/datasets/eqa/mp3d.yaml) | 44 MB |
| [Visual Language Navigation](https://bringmeaspoon.org/) | MatterPort3D | [vln_r2r_mp3d_v1.zip](https://dl.fbaipublicfiles.com/habitat/data/datasets/vln/mp3d/r2r/v1/vln_r2r_mp3d_v1.zip) | `data/datasets/vln/mp3d/r2r/v1` | [`datasets/vln/mp3d_r2r.yaml`](configs/datasets/vln/mp3d_r2r.yaml) | 2.7 MB |
| [Image goal navigation](https://github.com/facebookresearch/habitat-lab/pull/333) | Gibson | [pointnav_gibson_v1.zip](https://dl.fbaipublicfiles.com/habitat/data/datasets/pointnav/gibson/v1/pointnav_gibson_v1.zip) | `data/datasets/pointnav/gibson/v1/` | [`datasets/imagenav/gibson.yaml`](configs/datasets/imagenav/gibson.yaml) | 385 MB |
Expand Down
4 changes: 4 additions & 0 deletions configs/datasets/objectnav/hm3d.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
DATASET:
TYPE: ObjectNav-v1
SPLIT: train
DATA_PATH: data/datasets/objectnav/hm3d/v1/{split}/{split}.json.gz
56 changes: 56 additions & 0 deletions configs/tasks/objectnav_hm3d.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
ENVIRONMENT:
MAX_EPISODE_STEPS: 500

SIMULATOR:
TURN_ANGLE: 30
TILT_ANGLE: 30
ACTION_SPACE_CONFIG: "v1"
AGENT_0:
SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
HEIGHT: 0.88
RADIUS: 0.18
HABITAT_SIM_V0:
GPU_DEVICE_ID: 0
ALLOW_SLIDING: False
SEMANTIC_SENSOR:
WIDTH: 640
HEIGHT: 480
HFOV: 79
POSITION: [0, 0.88, 0]
RGB_SENSOR:
WIDTH: 640
HEIGHT: 480
HFOV: 79
POSITION: [0, 0.88, 0]
DEPTH_SENSOR:
WIDTH: 640
HEIGHT: 480
HFOV: 79
MIN_DEPTH: 0.5
MAX_DEPTH: 5.0
POSITION: [0, 0.88, 0]
TASK:
TYPE: ObjectNav-v1
POSSIBLE_ACTIONS: ["STOP", "MOVE_FORWARD", "TURN_LEFT", "TURN_RIGHT", "LOOK_UP", "LOOK_DOWN"]

SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
GOAL_SENSOR_UUID: objectgoal
SEMANTIC_CATEGORY_SENSOR:
WIDTH: 640
HEIGHT: 480
DATASET: "hm3d"
CONVERT_TO_RGB: True
RAW_NAME_TO_CATEGORY_MAPPING: "data/matterport_semantics/matterport_category_mappings.tsv"

MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SUCCESS', 'SPL', 'SOFT_SPL']

DISTANCE_TO_GOAL:
DISTANCE_TO: VIEW_POINTS
SUCCESS:
SUCCESS_DISTANCE: 0.1

DATASET:
TYPE: ObjectNav-v1
SPLIT: train
DATA_PATH: "data/datasets/objectnav/hm3d/v1/{split}/{split}.json.gz"
SCENES_DIR: "data/scene_datasets/"
30 changes: 30 additions & 0 deletions configs/tasks/pointnav_hm3d.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
ENVIRONMENT:
MAX_EPISODE_STEPS: 500
SIMULATOR:
AGENT_0:
SENSORS: ['RGB_SENSOR']
HABITAT_SIM_V0:
GPU_DEVICE_ID: 0
RGB_SENSOR:
WIDTH: 256
HEIGHT: 256
DEPTH_SENSOR:
WIDTH: 256
HEIGHT: 256
TASK:
TYPE: Nav-v0
SUCCESS_DISTANCE: 0.2

SENSORS: ['POINTGOAL_WITH_GPS_COMPASS_SENSOR']
POINTGOAL_WITH_GPS_COMPASS_SENSOR:
GOAL_FORMAT: "POLAR"
DIMENSIONALITY: 2
GOAL_SENSOR_UUID: pointgoal_with_gps_compass

MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SUCCESS', 'SPL']
SUCCESS:
SUCCESS_DISTANCE: 0.2
DATASET:
TYPE: PointNav-v1
SPLIT: train
DATA_PATH: data/datasets/pointnav/hm3d/v1/{split}/{split}.json.gz
10 changes: 10 additions & 0 deletions habitat/config/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,16 @@ def __init__(self, *args, **kwargs):
_C.TASK.PROXIMITY_SENSOR.TYPE = "ProximitySensor"
_C.TASK.PROXIMITY_SENSOR.MAX_DETECTION_RADIUS = 2.0
# -----------------------------------------------------------------------------
# SEMANTIC CATEGORY SENSOR
# -----------------------------------------------------------------------------
_C.TASK.SEMANTIC_CATEGORY_SENSOR = CN()
_C.TASK.SEMANTIC_CATEGORY_SENSOR.HEIGHT = 480
_C.TASK.SEMANTIC_CATEGORY_SENSOR.WIDTH = 640
_C.TASK.SEMANTIC_CATEGORY_SENSOR.TYPE = "SemanticCategorySensor"
_C.TASK.SEMANTIC_CATEGORY_SENSOR.CONVERT_TO_RGB = True
_C.TASK.SEMANTIC_CATEGORY_SENSOR.DATASET = "mp3d"
_C.TASK.SEMANTIC_CATEGORY_SENSOR.RAW_NAME_TO_CATEGORY_MAPPING = ""
# -----------------------------------------------------------------------------
# SUCCESS MEASUREMENT
# -----------------------------------------------------------------------------
_C.TASK.SUCCESS = CN()
Expand Down
119 changes: 119 additions & 0 deletions habitat/tasks/nav/nav.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,18 @@
from habitat.core.utils import not_none_validator, try_cv2_import
from habitat.sims.habitat_simulator.actions import HabitatSimActions
from habitat.tasks.utils import cartesian_to_polar
from habitat.tasks.nav.semantic_constants import (
GIBSON_CATEGORY_TO_TASK_CATEGORY_ID,
MP3D_CATEGORY_TO_TASK_CATEGORY_ID,
HM3D_CATEGORY_TO_TASK_CATEGORY_ID,
)
from habitat.utils.geometry_utils import (
quaternion_from_coeff,
quaternion_rotate_vector,
)
from habitat.utils.visualizations import fog_of_war, maps
from habitat_sim.utils.common import d3_40_colors_rgb
from PIL import Image

try:
from habitat.sims.habitat_simulator.habitat_simulator import HabitatSim
Expand Down Expand Up @@ -514,6 +521,118 @@ def get_observation(
)


@registry.register_sensor(name="SemanticCategorySensor")
class SemanticCategorySensor(Sensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this sensor for training or only for visualization purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sensor is used for training models with GT semantic goal inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srama2512, yes. But that GT data isn't available during evaluation and from this PR isn't clear how it will be replaced.

r"""Lists the object categories for each pixel location.
Args:
sim: reference to the simulator for calculating task observations.
"""
cls_uuid: str = "semantic_category"

def __init__(
self, sim: Simulator, config: Config, *args: Any, **kwargs: Any
):
self._sim = sim
self._current_episode_id = None
self.mapping = None
self.category_to_task_category_id = None
self.instance_id_to_task_id = None
self._initialize_category_mappings(config)

super().__init__(config=config)

def _get_uuid(self, *args: Any, **kwargs: Any):
return self.cls_uuid

def _initialize_category_mappings(self, config):
assert config.DATASET in ["gibson", "mp3d", "hm3d"]
if config.DATASET == "gibson":
cat_mapping = GIBSON_CATEGORY_TO_TASK_CATEGORY_ID
elif config.DATASET == "mp3d":
cat_mapping = MP3D_CATEGORY_TO_TASK_CATEGORY_ID
else:
cat_mapping = HM3D_CATEGORY_TO_TASK_CATEGORY_ID
self.category_to_task_category_id = cat_mapping
if config.RAW_NAME_TO_CATEGORY_MAPPING != "":
with open(config.RAW_NAME_TO_CATEGORY_MAPPING, "r") as fp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we leave this code it should use some csv reader or etc.

lines = fp.readlines()
lines = lines[1:]
lines = [l.strip().split(" ") for l in lines]
self.raw_to_cat_mapping = {}
for l in lines:
raw_name = l[1]
cat_name = l[-1]
if cat_name in cat_mapping:
self.raw_to_cat_mapping[raw_name] = cat_name
else:
self.raw_to_cat_mapping = {k: k for k in cat_mapping.keys()}

def _get_sensor_type(self, *args: Any, **kwargs: Any):
return SensorTypes.COLOR

def _get_observation_space(self, *args: Any, **kwargs: Any):
if self.config.CONVERT_TO_RGB:
observation_space = spaces.Box(
low=0,
high=255,
shape=(self.config.HEIGHT, self.config.WIDTH, 3),
dtype=np.uint8,
)
else:
observation_space = spaces.Box(
low=np.iinfo(np.int32).min,
high=np.iinfo(np.int32).max,
shape=(self.config.HEIGHT, self.config.WIDTH),
dtype=np.int32,
)
return observation_space

def get_observation(
self, *args: Any, observations, episode, **kwargs: Any
):
episode_uniq_id = f"{episode.scene_id} {episode.episode_id}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic sensor oservation here could be a Tensor right, which would create an unnecessary CPU <-> GPU. I think you can easily modify this to operator on both NpArrays and Torch.Tensor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is caching.

if self._current_episode_id != episode_uniq_id:
self._current_episode_id = episode_uniq_id
# Get mapping from instance id to task id
scene = self._sim.semantic_annotations()
self.instance_id_to_task_id = np.ones(
(len(scene.objects), ), dtype=np.int64
) * -1 # Non-task objects are set to -1
for obj in scene.objects:
if obj is None:
continue
obj_inst_id = int(obj.id.split("_")[-1])
obj_name = obj.category.name()
if obj_name in self.raw_to_cat_mapping:
obj_name = self.raw_to_cat_mapping[obj_name]
obj_task_id = self.category_to_task_category_id[obj_name]
self.instance_id_to_task_id[obj_inst_id] = obj_task_id
# Set invalid instance IDs to unknown object 0
semantic = np.copy(observations["semantic"])
semantic[semantic >= self.instance_id_to_task_id.shape[0]] = 0
# Map from instance id to task id
semantic_category = np.take(self.instance_id_to_task_id, semantic)
if self.config.CONVERT_TO_RGB:
semantic_category = self.convert_semantic_to_rgb(semantic_category)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RGB conversion should live in def observations_to_image(observation: Dict, info: Dict) -> np.ndarray:.
Something like this:

    if "semantic_category" in observation:
        flat_sem = observation[
            obj_semantic_name
        ]  # to move to same scale #.permute(2, 0, 1).unsqueeze(0).data.max(1)[1].cpu().numpy()[0]
        flat_sem[flat_sem == 41] = 40
        if not isinstance(flat_sem, np.ndarray):
            flat_sem = flat_sem.cpu().numpy()
        semantic_segmentation = (
            color_label(flat_sem).squeeze().transpose(1, 2, 0).astype(np.uint8)
        )
        egocentric_view.append(semantic_segmentation)

        if "objectgoal" in observation and "episode_info" in info:
            from habitat.tasks.nav.object_nav_task import task_cat2mpcat40

            # permute tensor to dimension [CHANNEL x HEIGHT X WIDTH]
            idx = task_cat2mpcat40[observation["objectgoal"][0]]
            goal_segmentation = (
                color_label(flat_sem == idx)
                .squeeze()
                .transpose(1, 2, 0)
                .astype(np.uint8)
            )
            egocentric_view.append(goal_segmentation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sensor is not for visualization purposes. It is intended to be a model input for training with oracle semantics. So the logic should live within the sensor itself, right?


return semantic_category

def convert_semantic_to_rgb(self, x):
max_valid_id = max(self.category_to_task_category_id.values())
assert max_valid_id < 39
# Map invalid values (-1) to max_valid_id + 1
invalid_locs = x == -1
x[x == -1] = max_valid_id + 1
# Get RGB image
semantic_img = Image.new("P", (x.shape[1], x.shape[0]))
semantic_img.putpalette(d3_40_colors_rgb.flatten())
semantic_img.putdata((x.flatten() % 40).astype(np.uint8))
semantic_img = np.array(semantic_img.convert("RGB"))
# Set pixels for invalid objects to (0, 0, 0)
semantic_img[invalid_locs, :] = np.array([0, 0, 0])
return semantic_img


@registry.register_measure
class Success(Measure):
r"""Whether or not the agent succeeded at its task
Expand Down
65 changes: 65 additions & 0 deletions habitat/tasks/nav/semantic_constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/usr/bin/env python3

# Copyright (c) Facebook, Inc. and its affiliates.
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.


GIBSON_CATEGORY_TO_TASK_CATEGORY_ID = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be somewhere: habitat/datasets/object_nav/....

'chair': 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on "Set invalid instance IDs to unknown object 0" you use 0 as unknown elsewhere.

'dining table': 1,
'book': 2,
'vase': 3,
'bottle': 4,
'couch': 5,
'bed': 6,
'refrigerator': 7,
'potted plant': 8,
'sink': 9,
'toilet': 10,
'clock': 11,
'towel': 12,
'tv': 13,
'oven': 14,
'cup': 15,
'umbrella': 16,
'bowl': 17,
'gym_equipment': 18,
'bench': 19,
'clothes': 20
}


MP3D_CATEGORY_TO_TASK_CATEGORY_ID = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different than https://github.com/niessner/Matterport/blob/master/metadata/mpcat40.tsv which will likely lead to confusion.

'chair': 0,
'table': 1,
'picture': 2,
'cabinet': 3,
'cushion': 4,
'sofa': 5,
'bed': 6,
'chest_of_drawers': 7,
'plant': 8,
'sink': 9,
'toilet': 10,
'stool': 11,
'towel': 12,
'tv_monitor': 13,
'shower': 14,
'bathtub': 15,
'counter': 16,
'fireplace': 17,
'gym_equipment': 18,
'seating': 19,
'clothes': 20
}


HM3D_CATEGORY_TO_TASK_CATEGORY_ID = {
'chair': 0,
'bed': 1,
'plant': 2,
'toilet': 3,
'tv_monitor': 4,
'sofa': 5,
}
5 changes: 5 additions & 0 deletions habitat/utils/visualizations/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,11 @@ def observations_to_image(observation: Dict, info: Dict) -> np.ndarray:
depth_map = depth_map.astype(np.uint8)
depth_map = np.stack([depth_map for _ in range(3)], axis=2)
render_obs_images.append(depth_map)
elif "semantic_category" in sensor_name:
semcat = observation[sensor_name]
if not isinstance(semcat, np.ndarray):
semcat = semcat.cpu().numpy()
render_obs_images.append(semcat)

# add image goal if observation has image_goal info
if "imagegoal" in observation:
Expand Down
4 changes: 2 additions & 2 deletions habitat_baselines/common/obs_transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def __init__(
self,
size: int,
channels_last: bool = True,
trans_keys: Tuple[str, ...] = ("rgb", "depth", "semantic"),
trans_keys: Tuple[str, ...] = ("rgb", "depth", "semantic_category"),
):
"""Args:
size: The size you want to resize the shortest edge to
Expand Down Expand Up @@ -145,7 +145,7 @@ def __init__(
self,
size: Union[numbers.Integral, Tuple[int, int]],
channels_last: bool = True,
trans_keys: Tuple[str, ...] = ("rgb", "depth", "semantic"),
trans_keys: Tuple[str, ...] = ("rgb", "depth", "semantic_category"),
):
"""Args:
size: A sequence (h, w) or int of the size you wish to resize/center_crop.
Expand Down
Loading