You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Table 2 of our NeurIPS'22 paper, we compare pre-training on pairs obtained:
by sampling two different viewpoints (with some overlap) on Habitat
by applying two different transforms on single images from ImageNet-1K
The latter case performs quite poorly. Our guess is that there exists some "easy" shortcut for the network to solve the cross-view completion task when the reference image directly comes from the other images by indirectly fitting the transformations.
Hi, have you done any ablation experiments on Croco on 3D vision tasks by pre-training on imagenet?
The text was updated successfully, but these errors were encountered: