Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use an additional model to extract object features in RE-ID if we have already done it? #1770

Open
1 task
daviduarte opened this issue Dec 11, 2024 · 5 comments
Labels
question Further information is requested

Comments

@daviduarte
Copy link

daviduarte commented Dec 11, 2024

Search before asking

  • I have searched the Yolo Tracking issues and found no similar bug report.

Question

Why do we use an additional RE-ID model to extract object features if we have already applied an R-CNN/Keypoint R-CNN to detect persons? Wouldn't it be easier to get the feature vectors for each bounding box through the ROI Align layer in the R-CNN/Keypoint R-CNN and pass them to the tracker? For instance, in the code torchvision_boxmot:

1º We run the line
pose_model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=True)
to detect persons in the scene and extract posture key points. However, in this model we already can access the bounding box features in the ROI Align layer.

2º Subsequentely, we execute:
tracker = BotSort( reid_weights=Path('osnet_x0_25_msmt17.pt'), # ReID model to use device=device, half=False, )
And there is another model to extract the object features, the osnet_x0_25_msmt17.

My question is: Isn't this a computational waste? Couldn't the features from R-CNN be used in the tracker instead of the osnet_x0_25_msmt17?

@daviduarte daviduarte added the question Further information is requested label Dec 11, 2024
@mikel-brostrom
Copy link
Owner

Object detectors are not trained to generate discriminative embeddings for same class instances

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Dec 11, 2024

You can try passing the embeddings from the detector to the tracker and evaluate on them 😄

@daviduarte
Copy link
Author

Object detectors are not trained to generate discriminative embeddings for same class instances

Perfect!

I'm curious now. I'l test it 😄

Thank you!

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Dec 12, 2024

Feel free to share your results here. Would be interesting to discus them 😄

@Fleyderer
Copy link
Contributor

This concept is called Joint Detection and Embedding (JDE) and is now becoming popular for real-time trackers.

But there are still many problems: these models are harder to train and deploy, and of course two separate models for a bit different tasks are always better than a compromise between detection and classification - features are similar but far from identical.

And while ReID by itself gives poor advantage, it's just not worth the time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants