Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird reproductibility error at inference #332

Open
leblond14u opened this issue Nov 8, 2024 · 9 comments
Open

Weird reproductibility error at inference #332

leblond14u opened this issue Nov 8, 2024 · 9 comments

Comments

@leblond14u
Copy link

Hi,

I'm using LightGlue with SuperPoint and I noticed a weird SuperPoint behavior.
When extracted on the sacre_coeur data (with no maximum points) my points seems weird :
Capture d’écran du 2024-11-08 09-56-13
When extracted with 2048 max points the points are looking a bit better but still ...
Capture d’écran du 2024-11-08 09-55-05
This weird behavior is then leeding me to no matches between the two images ...

Does anybody ever encountered the same issue with the descriptor or know how to solve this issue ?

Thanks in advance,
Best regards,

Hugo

@rpautrat
Copy link
Owner

rpautrat commented Nov 8, 2024

Hi, are you sure that the SuperPoint is correctly initialized, i.e. with the right pre-trained weights loaded?

Do you get the same if you test it on other images?

@leblond14u
Copy link
Author

leblond14u commented Nov 8, 2024

Hi,
Thanks for your answer.
I got the same king of results for the easy DSC_0410 scenario :
image
image

My weights are loaded from the https://github.com/cvg/LightGlue/releases/download/v0.1_arxiv/superpoint_v1.pth repo.
With still no matches to be found ...

@GoroYeh-HRI
Copy link

Hi,
I'm running the match_features_demo.py using the pretrained model sp_v6.
As instructed here:
https://github.com/rpautrat/SuperPoint?tab=readme-ov-file#matching-features-demo-with-pretrained-weights

However, I got the second descriptor extracted from SuperPoint model with dimension (0, 256)
which led to error and cannot be matched.
image

This is the pair of images being loaded.
They are hpatches-sequences-release/i_pool/1.ppm and 6.ppm.
image
Any idea why I'd get zero dimension descriptor from the darker image?
Thanks.

@leblond14u
Copy link
Author

leblond14u commented Nov 15, 2024

Update :
@rpautrat It seems my problem is specific to the inference on cuda devices.
When running the superpoints on my cpu I get no issues at all to find good matches on the sacre_coeur scenario.
My setup is :

  • Nvidia A5000
  • CUDA Version: 12.2
  • torch 2.1.2
  • torchvision 0.16.2

image
image
@GoroYeh-HRI Is your problem also related to gpu usage ?

Best,

Hugo

@GoroYeh-HRI
Copy link

@leblond14u Thanks for asking.
My GPU setup is:
NVIDIA RTX A5000
CUDA Driver Version: 12.5
nvcc -V: cuda 11.6
torch 2.0.1
torchvision 0.15.2

Not sure if this has something to do with the "no descriptor error" I met.
@rpautrat do you have any idea?

@rpautrat
Copy link
Owner

@leblond14u, I tested the torch SuperPoint model with recent versions of CUDA/Torch (CUDA 12.6 and Torch 2.4.1), and the detections look normal for me. So I am not sure where your problem is coming from... All I can suggest is to try another set of CUDA/Torch versions and see if this helps to resolve the problem.

@GoroYeh-HRI, which tensorflow version are you using? This repo is using an old version (e.g. 1.12 recommended).

@GoroYeh-HRI
Copy link

@leblond14u, I tested the torch SuperPoint model with recent versions of CUDA/Torch (CUDA 12.6 and Torch 2.4.1), and the detections look normal for me. So I am not sure where your problem is coming from... All I can suggest is to try another set of CUDA/Torch versions and see if this helps to resolve the problem.

@GoroYeh-HRI, which tensorflow version are you using? This repo is using an old version (e.g. 1.12 recommended).

Thanks for the prompt reply!
When I used 1.12 tensorflow, I got issue when training the MagicPoint.
The issue is: I got loss=nan, precision=nan, recall=0.0.
I read through the Github issues and still could not resolve this issue.
That's why I assume this is due to the incompatibility between the tensorflow version and my CUDA driver version (12.5)

@rpautrat
Copy link
Owner

rpautrat commented Nov 19, 2024

Yes, this is very much possible. Unfortunately, this repo is getting old and is only compatible with older versions of CUDA probably. Can you try with an earlier version?

@2896963297
Copy link

,我使用最新版本的 CUDA/Torch(CUDA 12.6 和 Torch 2.4.1)测试了 torch SuperPoint 模型,检测结果对我来说看起来很正常。所以我不确定你的问题从何而来......我只能建议尝试另一组 CUDA/Torch 版本,看看这是否有助于解决问题。
,您使用的是哪个 TensorFlow 版本?此存储库使用的是旧版本(例如推荐 1.12)。

感谢您的及时回复!当我使用 1.12 tensorflow 时,我在训练 MagicPoint 时遇到了问题。问题是:我得到了 loss=nan,precision=nan,recall=0.0。我通读了 Github 问题,但仍然无法解决此问题。这就是为什么我认为这是由于 tensorflow 版本和我的 CUDA 驱动程序版本( 12.5 )不兼容的原因

你可以试试nvidia-tensorflow,我之前遇到就是用的这个解决的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants