-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to replicate sp_v6 and descriptor loss #288
Comments
Hi, I can try to help you replicate the original results, but up to a limit only. This work is indeed more than 4 years old, and I don't remember all the details of the experiments anymore. But regarding your three points:
I hope this can be of some help to you. |
Indeed, you may want to tune |
Since issue is somewhat related to my issue #287 (comment), would you @martinarroyo please mind explaining and describing what changes you have made in the code for your results? |
Apologies for the belated response. My changes were minimal, I only made some modifications to the I/O logic so that it would work in my infrastructure as well as fixing some imports that were not working on my setup. The training logic was unaltered. |
HI @rpautrat , I have been getting negative distances as zero in every step. Is this normal? What might have occurred? Same hyperparameter values as you have used. |
Hi, having a zero negative loss is not impossible, but surprising. If you look at its definition here: SuperPoint/superpoint/models/utils.py Line 123 in 361799f
However, getting 0 at every steps seems a bit fishy and to good to be true. I would expect it to be positive for at least a few samples. Maybe you can try to plot a few values in the link above to understand what is happening. Checking the positive loss would also be interesting. |
SuperPoint/superpoint/models/utils.py Line 140 in 361799f
` positive_sum = torch.sum(valid_masklambda_ds*positive_dist) / valid_mask_norm
|
I would suggest printing a few values to debug your code and understand why the negative loss becomes zero in your case. This sounds to good to be true. |
[This is somewhat related to #287, but I'll open another issue so as not to pollute the other issue]
Hi @rpautrat, thanks for this work and also for providing support for the repo! I am trying to reproduce the results that you report in the README on HPatches as a first step towards making some changes to the model. In order to save some time, I labeled the COCO dataset using the pretrained model listed in the README (
MagicPoint (COCO)
) and launched a training with thesuperpoint_coco.yaml
config in its current state at HEAD. I had to make minor modifications to the codebase to get it to work in my infra (mostly I/O) but there should be no changes that affect training. I noticed that the negative and positive distances as reported in TensorBoard oscillate within a very small range of values (~[1e-5-1e-7]) and this got me worried. This seems strange based on the values reported in #277 (comment). For reference, here is how it looks on my current training (my machine restarted so the graphs look a bit funny, apologies for that):Precision and recall are also much lower than those in the
sp_v6
log, where recall goes up to 0.6, I can only get it to ~0.37.I looked into the yaml file in the
sp_v6
tarfile and noticed that the loss weights seemed to be adapted for the 'unnormalized' descriptors, so I reverted the changes introduced in 95d1cfd. This helps with the distances (the values are ~0.03 and 0.02 for positive and negative):But recall is still quite low (~0.38).
I also evaluated the model with normalization on HPatches and the results look reasonable. For comparison, I also loaded the sp_v6 checkpoint and ran the same evaluation:
The change in quality is not too bad, but I am still concerned that the numbers are always slightly below the ones reported in the README of this repository, so I would like to ask the following:
sp_v6
model? I am assuming it was trained on 2 GPUs using COCO data labeled with this checkpoint, but please correct me if I am wrong.Thanks a lot in advance for your help, much appreciated!
The text was updated successfully, but these errors were encountered: