Failure on training #45

wuqun-tju · 2024-09-24T02:30:28Z

Hello ， when I try to train the model using blendedmvs, it core in losses.py

assert valid_matches.shape == torch.Size([B, N]) and valid_matches.sum() > 0

can you help me solve it,

the training command is :

torchrun --nproc_per_node=1 train.py --train_dataset "1500 @ BlendedMVS(split='train', ROOT='/data/dust3r_sync/dust3r/data/blendedmvs_processed', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5)" --test_dataset "1000 @ BlendedMVS(split='val', ROOT='/data/dust3r_sync/dust3r/data/blendedmvs_processed', resolution=(512,384), n_corres=1024, seed=777)" --model "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', freeze='encoder', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True)" --train_criterion "ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean')" --test_criterion "Regr3D_ScaleShiftInv(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288)" --pretrained "/data/dust3r_sync/dust3r/checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth" --lr 0.0001 --min_lr 1e-06 --warmup_epochs 1 --epochs 10 --batch_size 1 --accum_iter 1 --save_freq 1 --keep_freq 5 --eval_freq 1 --disable_cudnn_benchmark --output_dir "checkpoints/mast3r_e171_roma_test_blended"

yocabon · 2024-09-24T07:48:36Z

Hi,
It's possible that some pairs yield 0 matches, I have noticed this behavior before. Since it did not create any issue when using higher batch size (because there was always at least a pairs with valid matches), we didn't try to remove these and it stayed as is.

nam1410 · 2024-10-06T21:19:16Z

I'm having a similar issue. Do you have a recommendation for a fix? @yocabon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure on training #45

Failure on training #45

wuqun-tju commented Sep 24, 2024

yocabon commented Sep 24, 2024 •

edited

Loading

nam1410 commented Oct 6, 2024

Failure on training #45

Failure on training #45

Comments

wuqun-tju commented Sep 24, 2024

yocabon commented Sep 24, 2024 • edited Loading

nam1410 commented Oct 6, 2024

yocabon commented Sep 24, 2024 •

edited

Loading