Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure on training #45

Open
wuqun-tju opened this issue Sep 24, 2024 · 2 comments
Open

Failure on training #45

wuqun-tju opened this issue Sep 24, 2024 · 2 comments

Comments

@wuqun-tju
Copy link

Hello , when I try to train the model using blendedmvs, it core in losses.py

assert valid_matches.shape == torch.Size([B, N]) and valid_matches.sum() > 0

can you help me solve it,

the training command is :

torchrun --nproc_per_node=1 train.py --train_dataset "1500 @ BlendedMVS(split='train', ROOT='/data/dust3r_sync/dust3r/data/blendedmvs_processed', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5)" --test_dataset "1000 @ BlendedMVS(split='val', ROOT='/data/dust3r_sync/dust3r/data/blendedmvs_processed', resolution=(512,384), n_corres=1024, seed=777)" --model "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', freeze='encoder', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True)" --train_criterion "ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean')" --test_criterion "Regr3D_ScaleShiftInv(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288)" --pretrained "/data/dust3r_sync/dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth" --lr 0.0001 --min_lr 1e-06 --warmup_epochs 1 --epochs 10 --batch_size 1 --accum_iter 1 --save_freq 1 --keep_freq 5 --eval_freq 1 --disable_cudnn_benchmark --output_dir "checkpoints/mast3r_e171_roma_test_blended"

@yocabon
Copy link
Contributor

yocabon commented Sep 24, 2024

Hi,
It's possible that some pairs yield 0 matches, I have noticed this behavior before. Since it did not create any issue when using higher batch size (because there was always at least a pairs with valid matches), we didn't try to remove these and it stayed as is.

@nam1410
Copy link

nam1410 commented Oct 6, 2024

I'm having a similar issue. Do you have a recommendation for a fix? @yocabon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants