Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thank you for your help #6

Open
lindermanfire opened this issue Feb 20, 2025 · 9 comments
Open

Thank you for your help #6

lindermanfire opened this issue Feb 20, 2025 · 9 comments

Comments

@lindermanfire
Copy link

Dear Professor/Researcher,

Thank you sincerely for your groundbreaking work. After carefully studying your paper and implementing the provided codebase, I would like to clarify several technical details:

Architectural Clarification: Could you confirm whether the framework employs a pre-trained large language model specifically for novel view synthesis? If affirmative, does the system leverage this pre-trained model to accelerate 3D reconstruction when processing new perspective inputs?

Performance Benchmark: The paper mentions an impressive inference time of 0.5 seconds. Could you specify the GPU specifications (model and VRAM configuration) used to achieve this benchmark? Additionally, would you provide guidance on expected performance scaling across different hardware setups?

Optimization Strategy: For real-time rendering applications, what would be the recommended number of input frames to balance between reconstruction quality and latency requirements? Furthermore, does the system incorporate temporal calibration mechanisms to ensure consistency between models generated at different time instances?

Implementation Issue: During environment setup, I encountered a "ModuleNotFoundError: AsymmetricCroCo3DStereo Model". Could you kindly indicate the proper source to obtain the pre-trained weights and architectural configurations for this component? Is it available through your designated repository or via third-party platforms?

Your expert insights would be invaluable for both academic understanding and practical implementation.

Best regards

@zhanghe3z
Copy link
Member

Could you please upload the error message?

@lindermanfire
Copy link
Author

name 'AsymmetricCroCo3DStereo' is not defined
File "E:\FLARE\run_pose_pointcloud.py", line 79, in main
model = eval(args.model)
File "E:\FLARE\run_pose_pointcloud.py", line 223, in
main(args)
NameError: name 'AsymmetricCroCo3DStereo' is not defined

Could you please upload the error message?

@lindermanfire
Copy link
Author

import sys
sys.argv=["run_pose_pointcloud.py" "--test_dataset" "1 @ CustomDataset(split='train', ROOT='./assets/zzx', resolution=(1920,1080),
seed=1, num_views=8, gt_num_image=0, aug_portrait_or_landscape=False, sequential_input=False)"
"--model" "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt',
output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24,
enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf))"
"--pretrained" "E://FLARE/checkpoints/geometry_pose.pth"
"--test_criterion" "MeshOutput(sam=False)" "--output_dir" "log/" "--amp" "1" "--seed" "1" "--num_workers" "0"] to run run_pose_pointcloud.py

@zhanghe3z
Copy link
Member

What's your run command?

@lindermanfire
Copy link
Author

What's your run command?
i use this to run run_pose_pointcloud.py. i run in windows and just have a single 4090D

import sys sys.argv=["run_pose_pointcloud.py" "--test_dataset" "1 @ CustomDataset(split='train', ROOT='./assets/zzx', resolution=(1920,1080), seed=1, num_views=8, gt_num_image=0, aug_portrait_or_landscape=False, sequential_input=False)" "--model" "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf))" "--pretrained" "E://FLARE/checkpoints/geometry_pose.pth" "--test_criterion" "MeshOutput(sam=False)" "--output_dir" "log/" "--amp" "1" "--seed" "1" "--num_workers" "0"] to run run_pose_pointcloud.py

@zhanghe3z
Copy link
Member

You can manually modify line 29 in the file run_pose_pointcloud.py at the following URL:

parser.add_argument('--model', default="AsymmetricCroCo3DStereo(patch_embed_cls='ManyAR_PatchEmbed')",
to: AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf))

@lindermanfire
Copy link
Author

You can manually modify line 29 in the file run_pose_pointcloud.py at the following URL:

FLARE/run_pose_pointcloud.py

Line 29 in 547902f
parser.add_argument('--model', default="AsymmetricCroCo3DStereo(patch_embed_cls='ManyAR_PatchEmbed')",
to: AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf))

Thank you,but now report errors Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
Warning, cannot find cuda-compiled version of RoPE2D, using a slow pytorch version instead
e:\FLARE\dust3r\dust3r\inference.py:65: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with torch.cuda.amp.autocast(enabled=bool(use_amp), dtype=dtype):
e:\FLARE\mast3r\model.py:395: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with torch.cuda.amp.autocast(enabled=True, dtype=torch.bfloat16):
e:\FLARE\dust3r\croco\models\blocks.py:110: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
x = torch.nn.functional.scaled_dot_product_attention(q, k, v, scale = self.scale, dropout_p=0.).transpose(1, 2).reshape(B, N, C) Index put requires the source and destination dtypes match, got Float for the destination and BFloat16 for the source. Do it means that I need to reinstall RoPE2D environment with cuda-complied?

@zhanghe3z
Copy link
Member

Try --amp 1

@lindermanfire
Copy link
Author

Try --amp 1

Once again, thank you for your assistance with the deployment. I've successfully obtained the training results, but I have a few follow-up questions to ask

1.Regarding the first question about model visualization: In your web-based model viewer interface, how can I examine the rendered outputs from different camera perspectives? Currently, I can only observe viewing options for point clouds and various pose estimation results (static images). However, I cannot find functionality to visualize cinematic rendering effects or dynamically adjust camera viewpoints for real-time rendering observation.

2.Regarding the methodological question: Does your approach essentially employ a pre-trained large-scale "world reconstruction" model to accelerate both camera pose estimation and the training process of 3D Gaussian Splatting (3DGS) models? If so, could you elaborate on how this foundational model contributes to the efficiency improvements in these two components of the pipeline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants