How to reproduce MobilePose v2 result? Which diagonal edge for normalized? #71

Learningm · 2022-08-19T08:19:16Z

Hi, I am interested in these amazing work but I wonder how to reproduce mobilepose v2 result.

How to understand the loss 'per vertex MSE normalized on diagonal edge length' ? What do you mean by diagonal edge length? 2D or 3D? I guess it should be 2D because the output keypoints are 2D, but which diagonal edge? We got six faces of the cuboid, 12 diagonal edges of cuboid faces, 2 diagonal edges for 3D cuboid. However, the 2 diagonal edges for 3D cuboid are not equal cause they are projected to the 2D space.

I guess the training pipeline should be:

2D detector training used 2D bbox data
use 2D detector (gt or predicted seems both ok) to generate cropping region, and crop the image, adjust the keypoints ground-truth according to the cropping, and use backbone to predict 9 2D keypoints, then compute loss.

Could you explain more details about this part(which diagonal edges)? Thank you very much.

Mechazo11 · 2025-02-05T18:53:03Z

Hi @Learningm I know this is an old issue but have you figured out after computing the 9 2D keypoints, how they were lifted to 3D space using EPnP

Learningm · 2025-02-07T14:16:52Z

Hi @Learningm I know this is an old issue but have you figured out after computing the 9 2D keypoints, how they were lifted to 3D space using EPnP

@Mechazo11 I didn't figure out. It's hard to reimplement with few details mentioned in papers.

Mechazo11 · 2025-02-07T15:26:24Z

@Learningm I am going to try this direction, sharing here to ask whether this kinda makes sense to you too.

We start by passing a 224x224x3 pytorch tensor of an object. Lets call the origin of this image crop $O$. Since the crop is part of the full image, we also know $O$'s coordinate in global image coordinate. Global here means the image coordinate of the entire image, not just the crop.

After ingesting the tensor, the network now gives me $9$ 2D `offsets normalized by dividing them by length of the diagonal of the image crop. The first row is centroid and rest are the eight corners (depending on the sequence they were defined).

To compute loss I also calculate the normalized offsets of the 9 2D keypoints from $O$ for the ground-truth. Now the loss should be MSE or smoothL1 score between the two sets of offsets + any penalty term that does not go into the part of the back propagation.

What do you think of this approach? My idea is rather than doing a coordinate shift, we capitalize on the well-known top-left corner that is commonly associated with 2D images.

Learningm · 2025-02-08T15:08:36Z

@Mechazo11 I'm afraid that i can not make some comments about your approach, cause i have moved to other topics instead of focusing this direction.

https://github.com/NVlabs/FoundationPose, as far as i know, this recent work about pose estimation seems pretty good, hope it helps !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reproduce MobilePose v2 result? Which diagonal edge for normalized? #71

How to reproduce MobilePose v2 result? Which diagonal edge for normalized? #71

Learningm commented Aug 19, 2022

Mechazo11 commented Feb 5, 2025

Learningm commented Feb 7, 2025

Mechazo11 commented Feb 7, 2025

Learningm commented Feb 8, 2025

How to reproduce MobilePose v2 result? Which diagonal edge for normalized? #71

How to reproduce MobilePose v2 result? Which diagonal edge for normalized? #71

Comments

Learningm commented Aug 19, 2022

Mechazo11 commented Feb 5, 2025

Learningm commented Feb 7, 2025

Mechazo11 commented Feb 7, 2025

Learningm commented Feb 8, 2025