Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about UUIDs #47

Closed
mintar opened this issue Dec 16, 2020 · 15 comments
Closed

Discussion about UUIDs #47

mintar opened this issue Dec 16, 2020 · 15 comments

Comments

@mintar
Copy link
Contributor

mintar commented Dec 16, 2020

As requested by @SteveMacenski, here's a new issue for continuing the discussion around UUIDs. As a starter, I'm going to summarize my comment from here: #43 (comment)

Main Question

Should we change the type of Detection*D/tracking_id to uuid_msgs/UniqueID?

Background

Previous discussion has happened in #25 and #43.

As a basis for this discussion, we'll use the ROS2 version of the messages from the ros2 branch. In Noetic and ROS2, there are two ID fields (see below):

  • ObjectHypothesisWithPose.id: This is the detected class (human, mug, ...). I think we can all agree that this should remain a string.
  • tracking_id: This is the ID of the object when tracked across multiple frames. The discussion is whether we should make this a UUID.
  • The single detections (without tracking across frames) don't have an ID. There was a discussion in the issues, but we decided against it, since each detection is already uniquely identified by the time stamp of its Detection*DArray message and its array index within that message.

Detection3D.msg

Header header
vision_msgs/ObjectHypothesisWithPose[] results
  # The unique ID of the object class. To get additional information about
  #   this ID, such as its human-readable class name, listeners should perform a
  #   lookup in a metadata database. See vision_msgs/VisionInfo.msg for more detail.
  string id

  float64 score
  geometry_msgs/PoseWithCovariance pose
vision_msgs/BoundingBox3D bbox
sensor_msgs/PointCloud2 source_cloud

# If this message was tracking result, this field set true.
bool is_tracking

# ID used for consistency across multiple detection messages. This value will
#   likely differ from the id field set in each individual ObjectHypothesis.
# If you set this field, be sure to also set is_tracking to True.
string tracking_id

Pros and Cons

The arguments for and against UUIDs boil down to:

Pro UUID:

  • If there are multiple object detector nodes, and some aggregator node just simply throws all their results together, UUIDs will prevent collisions between tracking IDs because they are globally unique. If there is only one object detector node, there won't be any collisions, since the publisher of the Detection*D message has to ensure that the tracking_id is locally unique.

Contra UUID:

  • Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly; apple-3456 is easier on the eyes than ec1cf114-7aa0-4402-8bf9-eee858118b7d when used in log outputs, RViz markers etc.
  • Perhaps there's not just a one-way flow of information from object detectors into the environment model, but perhaps the object detector re-aquires an object that is already known (by some ID) in the environment model and continues to track it. Having a string here gives us the flexibility of using the existing ID for tracking, no matter what format it is in.

Presumably, if you run multiple object detector nodes, they will all publish their results on separate topics. If you are worried about collisions, the aggregator node could simply prefix each tracking_id with the topic name, which would guarantee unique strings. In my view, this negates the only advantage of UUIDs. So I'm also opposed to changing this field to a UUID.

@gavanderhoorn
Copy link

  • Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly

pedantic, but doesn't this depend on the needs of the human in question? Putting UUIDs as strings in a field loses semantics, which is certainly a negative consequence of doing that. Introspection tools would have to parse the field to figure out it's a UUID. Some developers might consider that quite a loss.

Human readability can certainly be positive, but it doesn't seem objective to state in general it's better -- machines would likely disagree (if they could, but then they might also be able to understand human readable fields ;) ).

@mintar
Copy link
Contributor Author

mintar commented Dec 16, 2020

@gavanderhoorn Can you come up with a scenario where switching the tracking_id field to uuid_msgs/UniqueID is an advantage? I've tried to describe such a scenario above, and why I think it's easy to handle without forcing the use of UUIDs.

@mintar
Copy link
Contributor Author

mintar commented Dec 16, 2020

BTW, I grant you the point that declaring something as "much more user friendly" wasn't very objective of me. :)

@gavanderhoorn
Copy link

BTW, I grant you the point that declaring something as "much more user friendly" wasn't very objective of me. :)

that was mostly it. As I said: it was pedantic.

I do believe it's important to keep in mind that human-readability doesn't seem like something which would be too important in an actual application. During development: certainly. Afterwards? Depending on what the application is, there may never be any human eyes looking at those messages.

@mintar
Copy link
Contributor Author

mintar commented Dec 16, 2020

I absolutely agree. If somebody can convince me that requiring UUIDs has significant advantages in this use case, it might be worth sacrificing the advantages of permitting arbitrary strings. It's always a tradeoff.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 16, 2020

ObjectHypothesisWithPose.id: This is the detected class (human, mug, ...). I think we can all agree that this should remain a string.

Agreed, though there's been an argument made in other msgs to have classes be enums where the first 0-N/2 are reserved for ROS standardization that we could create a list of class objects for enums to belong to. Then N/2 - N would be reserved for user-specific classes. That's what we agreed to for the radar_msgs. If that sounds interesting, we can talk about that too, but I was not planning on that being a UUID. The enum ints also makes alot of sense because AI libraries don't output dog, they output 7 which you have a file that you can use to remap to dog. So using ints actually seems like it would be a very natural way to do things (though maybe you don't want downstream nodes to have to know that remapping, so it could be useful to do it for them).

The single detections (without tracking across frames) don't have an ID.

Where is that represented in the messages? I would think they should still have an ID field (of type whatever we decide for the tracking_id). Lacking a track shouldn't negate the information that a raw detection has to offer.


On UUID for tracking_id in the Detection*D messages:

Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly; apple-3456 is easier on the eyes than ec1cf114-7aa0-4402-8bf9-eee858118b7d when used in log outputs, RViz markers etc.

Isn't that what the ID string is for in the actual ObjectHypothesis? I don't see a rationale for having both be non-standardized non-unique strings.

Perhaps there's not just a one-way flow of information from object detectors into the environment model, but perhaps the object detector re-aquires an object that is already known (by some ID) in the environment model and continues to track it. Having a string here gives us the flexibility of using the existing ID for tracking, no matter what format it is in.

This is sending some alarm bells in my mind. Why is the Detection*N messages having tracks at all? They're literally called Detections. This should not be dependent on tracks. While trackers might use the Detection message, this is pretty useless as a tracking message in all honesty because it doesn't contain any velocity (or acceleration) information, or uncertainty measures. I think the concept that Detection*N is a 'track' should be dismissed and we should only be talking about detections for the design of it. And with detections, then there's no reason that tracking_id should be named tracking_id (it can just be uuid or id) because its just uniquely identifying 1 detection of potentially many detections in a given image or pointcloud.

ObjectHypothesisWithPose[] is a little strange to me. I understand what its going for, it could be useful to output not just the actual class of the object of interest, but the confidences of all of the the other classes it could have been. However, why is there not a message parent-level ID for what the highest probably class is (e.g. answering the question "what is this object")? That could even be something as simple as a int for the index in the results array to the highest, or a separate ObjectHypothesisWithPose instance with just the highest. It seems strange to me to have to iterate through a list of potentially hundreds of classes to find out the only highest value that people most likely want. This makes me think that the resistance on the ID to a string is because people are using tracking_id to embed the actual class information! We can separate those ideas and allow best of both worlds: a parent-level class field and a unique id.

Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly

Agreed for classes of objects, but instances of tracks or instances of detections in an image have no physical or semantic meaning. If this is a "track" or image then there might be 20 of them per image frame and 100,000 image frames per run. There's no semantic or unique meaning that can be really gained here meaningfully. What seems like a more likely thing is that users are abusing this field that's supposed to uniquely identify the unique detection in an image frame with the class of that detection. It would be much better to simply offer a class string for the highest probability class as I suggest above and let the id be what its supposed to be, a unique identifier to separate N detections in an image from each other.


Regardless if we change tracking_id to uuid, I think that name must go and should be id or uuid if we go with uuids.

tl;dr my position

  • Results is nice, but we need to specifically offer an easy way to find the class of the highest probability detection without having to iterate over a results array of potentially 100's of classes
  • tracking_id -> id, the word "track" is being abused I believe. This is not a Track2D message, its a Detection2D message.
  • id to type uuid because now there's no value in being human readable with an easy class parent-level field. human34 is no different than class: "human", id: 4324-3432-243-432. You may concat them if you like if you want something in the <class><unique numbe> format.
  • Potentially remove is_tracking, I don't see how that could be used, there isn't any tracking information in the message to be used as a tracker, but open to hearing more about why that's here

@mintar
Copy link
Contributor Author

mintar commented Dec 17, 2020

TL; DR:

  • rename tracking_id to object_id or even just id.
  • tracking_id should remain optional; stateless object detectors that just work frame-by-frame should leave it empty.
  • keep it a string, so that keys in an environment model/database can be used here.
  • require that the results array be sorted in descending order by their score.
  • rename the old id field to type or classification or something similar. class would be perfect, but then it won't compile in rosjava (been there...) because every Java class already has a class member.
  • rename (edit: remove) the is_tracking field. If tracking_id is the empty string, that means that the ID is invalid; no need for an extra boolean flag.

Agreed, though there's been an argument made in other msgs to have classes be enums where the first 0-N/2 are reserved for ROS standardization that we could create a list of class objects for enums to belong to. Then N/2 - N would be reserved for user-specific classes. That's what we agreed to for the radar_msgs. If that sounds interesting, we can talk about that too, but I was not planning on that being a UUID. The enum ints also makes alot of sense because AI libraries don't output dog, they output 7 which you have a file that you can use to remap to dog. So using ints actually seems like it would be a very natural way to do things (though maybe you don't want downstream nodes to have to know that remapping, so it could be useful to do it for them).

I think it's better to keep this a string. Incidentally, this field was an int64 before, and based on actual use cases we changed it to string, which turned out to be very useful. True, detection libraries output ints, not strings, but the remapping to the actual class is very dataset-dependent, so you'll have to do some remapping anyway (int to int instead of int to string).

Also, what class a given integer corresponds to often evolves over time when you add/remove classes and retrain your network. This will lead to problems when the messages (e.g. from a months-old rosbag) and the remapping information stored elsewhere go out of sync. Putting the class as a string into the message avoids this (or at least makes it easier to detect, e.g. when the message contains a class that was renamed/removed in the meantime).

The single detections (without tracking across frames) don't have an ID.

Where is that represented in the messages? I would think they should still have an ID field (of type whatever we decide for the tracking_id). Lacking a track shouldn't negate the information that a raw detection has to offer.

I'll respond to that below.

On UUID for tracking_id in the Detection*D messages:

Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly; apple-3456 is easier on the eyes than ec1cf114-7aa0-4402-8bf9-eee858118b7d when used in log outputs, RViz markers etc.

Isn't that what the ID string is for in the actual ObjectHypothesis? I don't see a rationale for having both be non-standardized non-unique strings.

Ok, that's true. One could easily construct a "human-readable" string for logs or for an RViz text marker from the class ID plus (let's say) the last 4 chars of the UUID. My other argument (using an ID that's referenced elsewhere, like in an environment model) still holds.

Perhaps there's not just a one-way flow of information from object detectors into the environment model, but perhaps the object detector re-aquires an object that is already known (by some ID) in the environment model and continues to track it. Having a string here gives us the flexibility of using the existing ID for tracking, no matter what format it is in.

This is sending some alarm bells in my mind. Why is the Detection*N messages having tracks at all? They're literally called Detections. This should not be dependent on tracks. While trackers might use the Detection message, this is pretty useless as a tracking message in all honesty because it doesn't contain any velocity (or acceleration) information, or uncertainty measures. I think the concept that Detection*N is a 'track' should be dismissed and we should only be talking about detections for the design of it. And with detections, then there's no reason that tracking_id should be named tracking_id (it can just be uuid or id) because its just uniquely identifying 1 detection of potentially many detections in a given image or pointcloud.

Ok, perhaps the word "track" is not very well-chosen, so perhaps let's rename it to object_id.

My view on this is the following: The tracking_id uniquely identifies a certain object in the scene. This ID should be consistent across several frames. At that point, you're doing some rudimentary tracking (even if it's as basic as "the bounding boxes of these two subsequent detections overlap more than 90%"). I agree that any more sophisticated tracking information (velocity, acceleration, ...) has no place in a DetectionXD message.

If the ID had to be unique for every single detection and carried no semantic information at all, it would be pretty useless. What's the point in uniquely identifying "the third element of the Detection3DArray that was published at time 1604410630.213572502"? If you reference such a UUID in a different topic, it would mean that the receiver of the message would have to additionally subscribe to the Detection3DArray topic and go through the messages until it finds the UUID that you're talking about. At that point, it would be much better to just copy the respective Detection3D into your own message. Also, I can't find any other ROS messages that do it that way.

Also, I believe that the purpose of a Detection3DArray is to describe a potential scene configuration: a set of objects that can explain the scene. This will usually come directly from an object detector, but I strongly believe that the message should be general enough to support further filtering/enrichment from other nodes, which could involve an environment model. For these purposes, having an ID that uniquely identifies an object is essential.

ObjectHypothesisWithPose[] is a little strange to me. I understand what its going for, it could be useful to output not just the actual class of the object of interest, but the confidences of all of the the other classes it could have been. However, why is there not a message parent-level ID for what the highest probably class is (e.g. answering the question "what is this object")? That could even be something as simple as a int for the index in the results array to the highest, or a separate ObjectHypothesisWithPose instance with just the highest. It seems strange to me to have to iterate through a list of potentially hundreds of classes to find out the only highest value that people most likely want. This makes me think that the resistance on the ID to a string is because people are using tracking_id to embed the actual class information! We can separate those ideas and allow best of both worlds: a parent-level class field and a unique id.

My proposal: Require that the results array must be sorted in descending order of score. That way you can just get the first element if you're only interested in the highest score. Incidentally, that's what I did in one of my projects, but it's not codified in the messages yet.

[...] What seems like a more likely thing is that users are abusing this field that's supposed to uniquely identify the unique detection in an image frame with the class of that detection. It would be much better to simply offer a class string for the highest probability class as I suggest above and let the id be what its supposed to be, a unique identifier to separate N detections in an image from each other.

As I said above: The detections within a DetectionXDArray are already identifed by their array index and don't need an extra ID. We don't put the class into the tracking_id (that would be abuse); instead, we use it to identify a given object. I wouldn't say that's abuse; it's exactly what this field is for.

  • Potentially remove is_tracking, I don't see how that could be used, there isn't any tracking information in the message to be used as a tracker, but open to hearing more about why that's here

I don't see why it's there either. I believe that's because tracking_id was a UUID in the original proposal (#18), and it's a bit awkward to check if all 16 bytes of the UUID array are 0; it's one more for loop that just clutters the code, so that's probably why the author included the is_tracking flag. When we changed the type from UUID to string, we just forgot to remove the is_tracking field.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 17, 2020

TL; DR:

    rename tracking_id to object_id or even just id.
    tracking_id should remain optional; stateless object detectors that just work frame-by-frame should leave it empty.
    keep it a string, so that keys in an environment model/database can be used here.
    require that the results array be sorted in descending order by their score.
    rename the old id field to type or classification or something similar. class would be perfect, but then it won't compile in rosjava (been there...) because every Java class already has a class member.
    rename the is_tracking field. If tracking_id is the empty string, that means that the ID is invalid; no need for an extra boolean flag.

OK, so agreement on the tracking_id -> id, we're getting somewhere.

require that the results array be sorted in descending order by their score.

That seems that too nitty-gritty of a thing to require, the outputs of alot of systems don't promise that so you'd be forcing people to own a sorting step.

rename the is_tracking field. If tracking_id is the empty string, that means that the ID is invalid; no need for an extra boolean flag.

To what?

rename the old id field to type or classification or something similar.

In which message, id doesn't exist for Detection*D, so I assume you mean object hypothesis. What's wrong with ID in that context? ID, class, and type are all interchangable in that context. I didn't think anyone had an issue with that.

This ID should be consistent across several frames.

UUIDs perfectly meet that need as well. And by your own comment The tracking_id uniquely identifies a certain object in the scene, emphasis on unique, if this is not a UUID or similar random generation, there's no possible way you can promise uniqueness.

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 17, 2020

A specific recommendation (minus comments):

Header header
ObjectHypothesisWithPose[] results
BoundingBox2D bbox
sensor_msgs/Image source_img
bool is_tracking
string id
string class
  • tracking_id -> id
  • Added class field so that you don't have to iterate through results to get the class type. Then need to iterate if you care about what else it could have been or metadata like confidences. Class could be type or something else if you like for the java issues, that's fine. I think you understand what I'm trying to propose though.

I still feel that id should be uuid especially when you have a separate parent level class. Now its much easier for you to uniquely identify tracks / detections and also have a human-readable name via the class. I think this is really the best compromise since it meets your core complaints about not being human-readable and ensures that something in this message is uniquely identifiable, regardless of the actual class type. It also fixes an oversight that the message doesn't provide an easy way to find the class ID of the detection easily.

@mintar
Copy link
Contributor Author

mintar commented Dec 18, 2020

OK, so agreement on the tracking_id -> id, we're getting somewhere.

Yup.

require that the results array be sorted in descending order by their score.

That seems that too nitty-gritty of a thing to require, the outputs of alot of systems don't promise that so you'd be forcing people to own a sorting step.

Yeah, it sounds like something many people would mess up.

rename the is_tracking field. If tracking_id is the empty string, that means that the ID is invalid; no need for an extra boolean flag.

To what?

Sorry, I meant "remove the is_tracking field", if we're keeping strings. If we switch back to UUID, we'll probably need it (see below).

rename the old id field to type or classification or something similar.

In which message, id doesn't exist for Detection*D, so I assume you mean object hypothesis. What's wrong with ID in that context? ID, class, and type are all interchangable in that context. I didn't think anyone had an issue with that.

Yes, I'm talking about ObjectHypothesis. I think it's very confusing to have two fields called id, even if they are at different levels of the hierarchy. There have been a couple of issues here that came just from the confusion, and I myself mix it up regularly because I find id to be a very unintuitive name for "class". Let's rename it so it's clear which one is the object ID and which one's the class.

This ID should be consistent across several frames.

UUIDs perfectly meet that need as well. And by your own comment The tracking_id uniquely identifies a certain object in the scene, emphasis on unique, if this is not a UUID or similar random generation, there's no possible way you can promise uniqueness.

UUIDs guarantee global uniqueness, i.e., IDs from multiple nodes won't collide. With a string field, we'll still have local uniqueness, i.e., the IDs from a single node publishing on a specific topic won't collide. Can you come up with a scenario where it's necessary to have global uniqueness?

Also note that UUIDs and strings are equal with regards to local uniqueness. If a publishing node is so buggy that it accidentally re-uses identifier strings, it's probably also buggy enough to re-use UUIDs (or not fill them at all).

Anyhow, after giving this more thought, I think I'm okay with using UUIDs. To recap, my main arguments pro-string were:

  1. Strings can be made human-readable. Counterargument: simply concatenating the class name + last 4 chars of the UUID gives a similar result.
  2. Keys from an external database can be used to uniquely identify objects. Counterargument: the tracking_id field should probably not be used for this.

I still don't really see the advantages of using UUIDs, but at least the disadvantages seem to be mitigated.

So if you really feel so strongly about using UUIDs, go ahead and do it, if @Kukanani agrees with you. I assume he's still a co-maintainer of this repo?

A specific recommendation (minus comments):

Header header
ObjectHypothesisWithPose[] results
BoundingBox2D bbox
sensor_msgs/Image source_img
bool is_tracking
string id
string class
* `tracking_id` -> `id`

* Added `class` field so that you don't have to iterate through `results` to get the class type. Then need to iterate if you care about what else it could have been or metadata like confidences. Class could be type or something else if you like for the java issues, that's fine. I think you understand what I'm trying to propose though.

If you can add message comments for id and class (and also rename the class field due to the Java issue), I think we're getting close to a PR. I think the message comment especially for the id field should clarify exactly what exactly the semantics of that field is. In my mind, it's like a tracking ID. In other words, if the ids of two DetectionXDs from different DetectionXDArrays match, then they are detections of the same object in the scene. The opposite isn't true: Just because two ids are different, that doesn't mean that they are two different objects. Tracking will get lost, so an object will be re-assigned a new id.

I think the id in ObjectHypothesis should be named something similar to class (but not exactly "class" because of Java), and the class here should then be called something like "most likely class", but shorter. You're the native speaker, maybe you can come up with some nice and crisp names for these things.


Another thing we should discuss is what happens if a node doesn't do tracking at all (which will probably be true in most cases)? If id is a string, then I'd expect the publishing node to simply leave id == "", which can easily be detected by the subscribing node. However, if we switch back to UUIDs, then what? Also just leave it unset? That means that the subscribing node will have to iterate over the 16-bit array to check if all elements are 0. The runtime is negligible, but it clutters up the code. Set it to a new random UUID? That works, but it drops the information that the publishing node wasn't doing any tracking at all; it just looks like it's very bad at tracking, since for every detection it opens a new track and never resumes it.

I think that's where the is_tracking flag came from. It doesn't make sense when using strings, but it makes sense when using UUIDs.


Last point: If we're pulling most_likely_class out of the ObjectHypothesisWithPose, we also need to pull out most_likely_pose (at least for Detection3D). At that point, why not also pull out most_likely_score? Or just ObjectHypothesisWithPose most_likely? (I don't like my names, but you get the point).

@SteveMacenski
Copy link
Member

SteveMacenski commented Dec 18, 2020

so...

  • remove is_tracking
  • tracking_id to id
  • id in object hypothesis to class or type (object_id)
  • class in root of message for easy access.

Actually, maybe we should leave object hypothesis alone and just make tracking_id to type so nothing changes there.

UUIDs guarantee global uniqueness, i.e., IDs from multiple nodes won't collide. With a string field, we'll still have local uniqueness, i.e., the IDs from a single node publishing on a specific topic won't collide. Can you come up with a scenario where it's necessary to have global uniqueness?

Easily. I have 3 cameras on a robot and I'm running CNNs on their images in 3 instances of a CNN node. I'm using those detections to tracking moving obstacles in the scene correlating AI detections to their 3D detections in the robot's environmental model. I want global uniqueness so that each detection output isn't mixed in with the others. This is actually an exact example of what we're doing right now in navigation2_dynamic.

So if you really feel so strongly about using UUIDs, go ahead and do it, if @Kukanani agrees with you. I assume he's still a co-maintainer of this repo?

Yes of course, I'm just getting the ball rolling

So what are you using id in the Detection2D for if class has the class type? Can you give me some examples of what you currently put into tracking_id that isn't just class information?

Another thing we should discuss is what happens if a node doesn't do tracking at all (which will probably be true in most cases)?

... it then should be a uuid so its globally unique 😄

Last point: If we're pulling most_likely_class out of the ObjectHypothesisWithPose, we also need to pull out most_likely_pose (at least for Detection3D). At that point, why not also pull out most_likely_score? Or just ObjectHypothesisWithPose most_likely? (I don't like my names, but you get the point).

Detection*D doesn't have a ObjectHypothesisWithPose, it has ObjectHypothesisWithPose[] of all of the scores of all of the classes in the detector. That can stay as is. We're just adding the info of the highest probability class directly to root so that you don't have to iterate though a potentially hundreds long list to get what you really wanted: the most likely one. We could instead create a message level ObjectHypothesisWithPose to replace type/id/class from the Detection*D so that its the most likely type + its score without iterating through results.

@mintar
Copy link
Contributor Author

mintar commented Dec 19, 2020

so...

  • remove is_tracking
  • tracking_id to id
  • id in object hypothesis to class or type (object_id)
  • class in root of message for easy access.

Not object_id (we're talking about the class here, not the object). Otherwise agreed!

Actually, maybe we should leave object hypothesis alone and just make tracking_id to type so nothing changes there.

I didn't get that. Are you proposing to rename tracking_id to type? That would be wrong.

UUIDs guarantee global uniqueness, i.e., IDs from multiple nodes won't collide. With a string field, we'll still have local uniqueness, i.e., the IDs from a single node publishing on a specific topic won't collide. Can you come up with a scenario where it's necessary to have global uniqueness?

Easily. I have 3 cameras on a robot and I'm running CNNs on their images in 3 instances of a CNN node. I'm using those detections to tracking moving obstacles in the scene correlating AI detections to their 3D detections in the robot's environmental model. I want global uniqueness so that each detection output isn't mixed in with the others. This is actually an exact example of what we're doing right now in navigation2_dynamic.

So that's the exact same scenario as I outlined in the body of this issue. The problem that track IDs from different detector nodes can collide can easily be solved by keeping the outputs of the 3 detectors separate in the aggregator node. You should probably do that anyway, because:

  1. It doesn't make sense trying to correlate track IDs published by Detector A with ones from Detector B; they'll never match.
  2. An object can be tracked simultaneously by two different detectors under different track IDs, and the aggregator node will probably have to try and match them.

There's a huge difference between the following two scenarios:

  1. Detector A sees two mugs simultaneously, Detector B sees nothing.
  2. Detector A sees one mug. Detector B sees one mug.

In case (1), you know there are at least two distinct mugs, and you shouldn't try to match them. In case (2), it's possible that there are either one or two mugs, so matching should be possible.

In short: Any implementation that just jumbles all object detections together is probably suboptimal. If we don't throw them all together, we don't need UUIDs. But also UUIDs don't hurt, so I'm okay with them.

So what are you using id in the Detection2D for if class has the class type? Can you give me some examples of what you currently put into tracking_id that isn't just class information?

Oh, now I get you. Yes, currently the IDs are just <class name> + <counter>. But I don't parse the class name back out of that string; that would be abuse. They could really be just about anything, as long as they are unique.

In one application, I have a node that takes in Detection3DArrays from an object detector, matches the objects with an existing environment model and then publishes a new Detection3DArray where the IDs are the object IDs from the environment model. But during our discussion I've come to realize that this is probably not the correct way to use the ID field; it should only contain tracking information. So I'm okay with UUIDs now.

Another thing we should discuss is what happens if a node doesn't do tracking at all (which will probably be true in most cases)?

... it then should be a uuid so its globally unique 😄

Not an answer to my question. 😄

If no tracking is taking place, we don't need an ID at all. It might be useful information to an aggregator node if it should even attempt to match IDs to previous detections, or if that's pointless because the publishing node always assigns new random IDs.

Last point: If we're pulling most_likely_class out of the ObjectHypothesisWithPose, we also need to pull out most_likely_pose (at least for Detection3D). At that point, why not also pull out most_likely_score? Or just ObjectHypothesisWithPose most_likely? (I don't like my names, but you get the point).

Detection*D doesn't have a ObjectHypothesisWithPose, it has ObjectHypothesisWithPose[] of all of the scores of all of the classes in the detector.

Small nitpick: It can be all the classes in the detector, but it can also be just a subset (such as the ones with non-zero scores). If the detector only outputs a single class, it can also just be a single-element array.

That can stay as is. We're just adding the info of the highest probability class directly to root so that you don't have to iterate though a potentially hundreds long list to get what you really wanted: the most likely one. We could instead create a message level ObjectHypothesisWithPose to replace type/id/class from the Detection*D so that its the most likely type + its score without iterating through results.

Yes, I got that and I fully agree that it's useful to have direct access to the most likely class. All I'm saying is that it's equally useful to have direct access to the corresponding pose, otherwise you're iterating through the results array again. And continuing that thinking, maybe you also want to have the corresponding score.

@Kukanani
Copy link
Collaborator

Thanks for hashing out the details of this issue. Given our running list of changes, as I understand them:

* remove `is_tracking`

No objections. I have never used this field, and I've never seen of anyone else using it. If anything this should probably be metadata in VisionInfo (not proposing that change right now, just pointing out that if someone wants this field back, let's talk about adding it to VisionInfo).

* `tracking_id` to `id`

No objections.

* `DetectionXD/id` (formerly `DetectionXD/tracking_id`) to a UUID rather than a string.

Sorry, but I still object.

  1. Perhaps it's just that I've never been deep enough into the development of a multi-detector, multi-object tracker, but I still don't understand the need to have every single detection have a globally unique identifier. I don't deny that this would be useful...just that I have never personally wanted it. But I don't understand @SteveMacenski 's use case. In particular, I would like clarification on what "global uniqueness so that each detection output isn't mixed in with the others" means. How does having a globally unique field on every single detection message help differentiate between three different detectors? Or is the point that each detector/detection algorithm supplies the same UUID to all of its detections? Please help me understand this.

  2. uuid_msgs is an additional upstream dependency. It is also a bit hard to get good details on. I believe that this is the correct reference. However, a Google search for uuid_msgs github pushes that link to the 5th spot. The wiki page, which is the first hit, is not helpful, and this appears to be another "false trail" that appears earlier in the search results. This is not seem like a developer-friendly dependency.

  3. I agree with @gavanderhoorn 's point that we lose semantics by using a string rather than a UUID. However, there is extra work required (potentially an extra dependency) to generate UUIDs, including adding a dependency. We also have boost::uuids::uuid fromHexString(std::string) as a standard conversion interface, but that seems to be a brittle function based on input (read function comments at above link). I could go on with this line of thinking (string vs UUID), but I don't want to continue any further down this line of thinking without knowing exactly what the proposed purpose of the UUID is (what needs to be differentiated from what, and why?)

* `id` in object hypothesis to `class` or `type` (`object_id`)

Complete with the merging of #48, so this can be dropped from discussion

* class in root of message for easy access.

I begrudgingly agree with this one...or at least the idea of it.

I definitely understand the need to get the top class (what we are proposing as id) from a probabilistic detector that outputs probabilities for many classes. Unfortunately, storing the same information in two places is a violation of DRY and sets off alarm bells in my head. This "top class" and the list of individual detections could easily get out of sync. I'm not aware of any other messages that use this technique. Again we would depend on people to carefully read the message comments (which I think rarely happens) to know how to construct the message correctly.

I propose a couple of options here (numbered for easy reference)

  1. Instead of class, we could use int index_of_best_hypothesis (long name that gets the point across). This allows the entire best hypothesis to be looked up, including its score. From my experience, this is a common need: you want to know the top class, but you also want to know how likely this top class is. Without indexing into the results, you'd have to sort/search the whole list of results again anyway to get the score. This option has a benefit, which is that the default value of 0 will work perfectly if the results list is sorted by score, without any extra work. It also has a drawback in that if results is not sorted and this field is not set, it's worse than useless
  2. Instead of class, we could provide ObjectHypothesisWithPose top_result. This gives the same ability as above (get the score and class of top result). It also is flexible, and we can make it clear in the comments: A single top result can be given in top_result, multiple results can be given with no top_result, or both can be provided for convenience.
  3. Something else? Open to other options.

Kukanani added a commit that referenced this issue Dec 20, 2020
This field does not seem useful, and we are not aware of anyone using it at this time. `VisionInfo` is probably a better place for this information anyway, if it were needed. 

See #47 for earlier discussions.
@mintar
Copy link
Contributor Author

mintar commented Dec 21, 2020

  • id in object hypothesis to class or type (object_id)

Complete with the merging of #48, so this can be dropped from discussion

No, that wasn't changed in #48. I've created a new pull request that does this: #52.

@SteveMacenski
Copy link
Member

I think we've settled that we're just going to leave it the same as it is with the new name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants