-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use outputs of layout / angles from a pretrained model? #55
Comments
Would it be possible to provide an example of how to use the SUN RGB-D model for (v3 | [email protected]: 43.7 which uses 20211007_105247.pth and imvoxelnet_total_sunrgbd_fast.py) model, with an arbitrary in the wild image? Specifically, it's a bit confusing at the moment for what assumptions to make regarding the "lidar2img" dict entries:
Any help would be greatly appreciated! |
Hi Garrick, Thanks for your interest in our research. I will try to answer some of your questions. First, I've never tried to run this code on images in the wild, only on 4 datasets from our paper. But it should be somehow possible. We support KITTI, nuScenes, ScanNet and 3 benchmarks for SUN RGB-D. As I remember, for all 6 benchmarks we predict boxes in the world coordinate system, so we use both extrinsics and instrinsics provided in the datasets. All these projection matrices are only used in this function. However, this function has a special case for SUN RGB-D dataset and Total3dUnderstanding benchmark. As I remember, we follow their idea of parametrizing the extrinsics matrix with 2 angles and predict them for inference. So, regarding to your question about extrinsics, the answer is - we don't use this matrix for inference of the model you are interested. The part about Hope these answers help you, or feel free to ask any new questions. |
Hi Danila, Thank you for the fast response! Your answers clear up a lot of my confusion for using the imvoxelnet_total_sunrgbd_fast model! I really appreciate you taking the time here. To re-cap. If we were inferring with the above model on either SUN RGB-D data where only the intrinsics are assumed or in the wild data (where we may estimate or sometimes know the intrinsics), then the origin should be set to [0, 3, -1]. The extrinsics can be set to Identity since it won't be used. By the way, as a sanity check I did verify that the input extrinsics array appears to have no effect on the model output unlike the origin 3x1 array, which corresponds greatly to my understanding now. My last confirmation is that for the above model, should we manually apply the predicted extrinsics (2 angles for pitch and roll) to each estimated box following the logic of get_extrinsics? |
I probably don't quite understand your last question. In my understanding we need extrinsics for monocular 3d detection here as the ground truth box is parametrized by only one (not 3) rotation angle around Btw, I saw your omni3d paper last week, and taking this opportunity want to say that it is really great :) |
I just wanted to make sure that the get_extrinsics function was the logic I should be using. It seems like it is from your comment! My understanding of the high-level flow for visualization of the model outputs seems to be: intrinsics @ get_extrinsics(angles) @ vertices_3D_with_yaw_applied. I didn't notice the get_extrinsics function until recently. That plus the origin clarifications you gave earlier were the missing puzzle pieces. Feel free to close this issue at your convenience. Thanks again for the fast turn around.
Thank you! I would like train ImVoxelNet on the Omni3D dataset in the future if possible. It's a really impressive baseline. I attached a few quick COCO examples from this model (we are not using these anywhere, I am just curious what happens with no additional training). In my opinion, given that this model is only trained on SUN RGB-D and we are merely guessing at the intrinsics, these images do pretty well. I'm using a pretty low threshold for visualizing and set the origin=[0, 3, -1] as suggested. I'm very curious what the generalization power is when it's trained on 234k images. |
I'm playing with the SUN RGB-D model for (v3 | [email protected]: 43.7 which uses 20211007_105247.pth and imvoxelnet_total_sunrgbd_fast.py).
For each image I'm testing, I have the RGB and a 3x3 intrinsic matrix only which goes from camera space to screen.
I've been able to follow the demo code in general so far! Perhaps I'm missing it, but the flow and pipeline for the available demo's appear to not use the outputs for layout / angles? However, the visualized images elsewhere seem to have layout or room tilt predictions applied along with the per-object yaw angles.
I want to make sure that I'm using the SUN RBG-D model correctly. Are there any examples I can follow to make sure I can apply the room tilts to the objects? E.g., say if my end goal is a 8 vertex mesh per object that is in camera coordinates?
For instance, show_result and _write_oriented_bbox seem to only use the yaw angle. It seems like those are the two main functions for visualizing (unless I'm missing some code).
To be clear, the predictions are definitely being made as expected. It's only the exact steps for applying them that are ambiguous to me
The text was updated successfully, but these errors were encountered: