You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the Yolo Tracking issues and found no similar bug report.
Question
Why do we use an additional RE-ID model to extract object features if we have already applied an R-CNN/Keypoint R-CNN to detect persons? Wouldn't it be easier to get the feature vectors for each bounding box through the ROI Align layer in the R-CNN/Keypoint R-CNN and pass them to the tracker? For instance, in the code torchvision_boxmot:
1º We run the line pose_model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=True)
to detect persons in the scene and extract posture key points. However, in this model we already can access the bounding box features in the ROI Align layer.
2º Subsequentely, we execute: tracker = BotSort( reid_weights=Path('osnet_x0_25_msmt17.pt'), # ReID model to use device=device, half=False, )
And there is another model to extract the object features, the osnet_x0_25_msmt17.
My question is: Isn't this a computational waste? Couldn't the features from R-CNN be used in the tracker instead of the osnet_x0_25_msmt17?
The text was updated successfully, but these errors were encountered:
This concept is called Joint Detection and Embedding (JDE) and is now becoming popular for real-time trackers.
But there are still many problems: these models are harder to train and deploy, and of course two separate models for a bit different tasks are always better than a compromise between detection and classification - features are similar but far from identical.
And while ReID by itself gives poor advantage, it's just not worth the time
Search before asking
Question
Why do we use an additional RE-ID model to extract object features if we have already applied an R-CNN/Keypoint R-CNN to detect persons? Wouldn't it be easier to get the feature vectors for each bounding box through the ROI Align layer in the R-CNN/Keypoint R-CNN and pass them to the tracker? For instance, in the code torchvision_boxmot:
1º We run the line
pose_model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=True)
to detect persons in the scene and extract posture key points. However, in this model we already can access the bounding box features in the ROI Align layer.
2º Subsequentely, we execute:
tracker = BotSort( reid_weights=Path('osnet_x0_25_msmt17.pt'), # ReID model to use device=device, half=False, )
And there is another model to extract the object features, the osnet_x0_25_msmt17.
My question is: Isn't this a computational waste? Couldn't the features from R-CNN be used in the tracker instead of the osnet_x0_25_msmt17?
The text was updated successfully, but these errors were encountered: