Segmentation Message Types? #87

lexi-brt · 2023-04-25T20:42:19Z

Hi!

I'm looking for a std ros type to use for segmentation outputs.

Something like these:
https://github.com/DavidFernandezChaves/Detectron2_ros/blob/master/msg/Result.msg
https://github.com/akio/mask_rcnn_ros/blob/kinetic-devel/msg/Result.msg

Is there something in this package that's suitable for this already? If not, how would I go about contributing a proposal and getting something merged?

SteveMacenski · 2023-04-26T21:58:10Z

What are you looking for? These don't appear to me to be pixel-wise segmentation classes, unless you're only looking at the sensor_msgs/Image[] masks and not the sensor_msgs/RegionOfInterest[] boxes

https://github.com/ros-perception/vision_msgs/blob/ros2/vision_msgs/msg/Detection2D.msg does something like the boxes in those messages. I definitely don't disagree a segmentation message would be valuable and in discussion in #63. I think it might be good to start with a proposal and @Kukanani and I can review and we can go from there!

Kukanani · 2023-04-26T23:40:53Z

Yes, happy to consider any proposals on the segmentation front!

mintar · 2023-04-27T08:30:20Z

I agree with everything @SteveMacenski said. Personally, I'd go with one of the following approaches:

Either, create a new message type:

std_msgs/Header header
vision_msgs/Detection2DArray detections
sensor_msgs/Image[] masks

Or, publish those things on separate topics, one vision_msgs/Detection2DArray for the detections and one sensor_msgs/Image for each mask.

This also depends what's inside the mask image(s):

Is it a single segmentation image, where each pixel value is the class label?
Is it a mask for each individual object, with the pixel value representing the probability that the pixel belongs to the object?

If it's a single segmentation image, I'd go with approach (2) above, for the reasons I've outlined in this comment. If it's individual object masks (which is what Mask R-CNN is doing), approach (2) becomes very cumbersome/impossible, so I'd go with approach (1).

SteveMacenski · 2023-05-01T19:23:18Z

I'm not 100% sure I understand having detection and segmentation masks together - these are often different processes building bounding boxes vs pixel-wise segmentation masks (though I suppose a BB could be generated from a mask rather easily).

LabelInfo is meant to communicate the label's class IDs to string IDs, so that could be reused here just like in detections with synchronized topics.

I agree instance segmentation vs class segmentation adds in a wrench. For class segmentation, 1 image is OK, but for instance segmentation, we may need N images for the N instances or try to find a way to embed that in an imagine in another way. Perhaps a new Image-like message containing the class, instance, and probability info for each pixel so it could work for any situation (and instance = 0 for non-instance segmentation implementations)

gachiemchiep · 2023-08-27T02:21:11Z

@SteveMacenski @mintar
I think we could use the design of PasCalVOC dataset for this problem.

For example, this is JPEGImage :

The mask for class segmentation (semantic segmentation) is like this:

Then for instance segmentation, they add another mask for object like this:

By using this rule, only 2 mask images is needed.

SteveMacenski · 2023-08-28T17:46:49Z

But how does that distinguish the class of the instance? If you just have instance 1...N for 1...N objects, you'd have multiple 1 blocks representing different first-instances of N classes

I think that mask would need to have 2 values: 1 for the instance # and another for the class #. It doubles the message size which I don't love, but without doing bit shifting, that's I think the best we can do. For non-instance segmentation algorithms, that can be left empty/non-allocated so it shouldn't be a huge amount of overhead relative to the image segmentation message size.

Thoughts @mintar ?

gachiemchiep · 2023-08-28T23:31:14Z

@SteveMacenski
Maybe my writting is a little bit confused.

For sementic segmentation, 1 mask image is needed:
1 mask image for class as showed in 2nd picture.

For instance segmentation 's result, 2 mask images are needed.
1 mask image for class as showed in 2nd picture
1 mask image for object as showed in 3rd picture.

Instance segmentation can also be explained like :

First do the detection to find all object' s box. The box msg is Detection2D.msg
For each box, do the segmentation to find the object mask inside

instead of publishing entire image mask as above approach. We could cut off the mask for each box. then attach the mask image to each box's msg.

LorenzoFerriniCodes · 2024-12-05T14:41:22Z

Hi! I found this issue after reading the README. There, it is stated that segmentation masks should be published as images. I have some doubts about this strategy: should it be one image per detection? Should it be structured the same way as proposed by @gachiemchiep (two images, instances and classes) + one for the confidence? I am confused, as some popular object segmentation algorithms provide full image mask scores, and publishing a full size image for each detection is not feasible.

The first solution proposed by @mintar seems the most natural one to me, where for each detection a segmentation mask is provided. Is there any specific reason why this solution was not adopted?

SteveMacenski · 2024-12-05T21:25:24Z

You can have various channels in an image (i.e. rgb has 3, greyscale has 1) -- but I would expect it to have ~2 channels: class and confidence. if instances are provided by that segmentation model, then 3 :-) You wouldn't need 3x images, you'd have 1x image of 3 channels

mintar · 2024-12-06T10:31:30Z

You wouldn't need 3x images, you'd have 1x image of 3 channels

It sounds simpler and more efficient to have separate single-channel images, no? They can be in the same message.

Three single-channel images take up the same space as one three-channel image. If an implementation doesn't provide one type of information (e.g., instance segmentation), that image can be empty if the images are separated. If you mix it into a combined image, you're wasting space.

Also, you save the effort of having to mux and demux the separate images into a composite 3-channel image both on the sender's and the receiver's end.

LorenzoFerriniCodes · 2024-12-11T14:30:08Z

How should we handle overlapping masks with this image-based approach? @mintar solution with one mask for each detection would work perfectly for overlapping masks. Moreover, I can see the sum of the masks for the detected objects being smaller than a full image quite often.

Kukanani · 2024-12-13T03:12:42Z

usually, if there are detected objects smaller than the full image, you still want to maintain the entire image size for the segmentation, and use an "unknown" or "background" class for the other pixels. This is standard practice for segmentation pipelines, so that you maintain the position of the segmented objects in the image.

LorenzoFerriniCodes · 2024-12-13T08:50:43Z

By having a mask per detection, the semantic class would be part of the detection information. There would be no need to report the class in the mask itself, which could be a 1/0 (or 255/0) mask. The full image segmentation, if required (I can see an rviz plugin doing this) could be reconstructed by combining the information in the detection message (as the x, y coordinates of the related bounding box) with the provided mask.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation Message Types? #87

Segmentation Message Types? #87

lexi-brt commented Apr 25, 2023

SteveMacenski commented Apr 26, 2023

Kukanani commented Apr 26, 2023

mintar commented Apr 27, 2023 •

edited

Loading

SteveMacenski commented May 1, 2023

gachiemchiep commented Aug 27, 2023

SteveMacenski commented Aug 28, 2023

gachiemchiep commented Aug 28, 2023

LorenzoFerriniCodes commented Dec 5, 2024

SteveMacenski commented Dec 5, 2024

mintar commented Dec 6, 2024

LorenzoFerriniCodes commented Dec 11, 2024

Kukanani commented Dec 13, 2024

LorenzoFerriniCodes commented Dec 13, 2024

Segmentation Message Types? #87

Segmentation Message Types? #87

Comments

lexi-brt commented Apr 25, 2023

SteveMacenski commented Apr 26, 2023

Kukanani commented Apr 26, 2023

mintar commented Apr 27, 2023 • edited Loading

SteveMacenski commented May 1, 2023

gachiemchiep commented Aug 27, 2023

SteveMacenski commented Aug 28, 2023

gachiemchiep commented Aug 28, 2023

LorenzoFerriniCodes commented Dec 5, 2024

SteveMacenski commented Dec 5, 2024

mintar commented Dec 6, 2024

LorenzoFerriniCodes commented Dec 11, 2024

Kukanani commented Dec 13, 2024

LorenzoFerriniCodes commented Dec 13, 2024

mintar commented Apr 27, 2023 •

edited

Loading