Weird reproductibility error at inference #332

leblond14u · 2024-11-08T09:42:52Z

Hi,

I'm using LightGlue with SuperPoint and I noticed a weird SuperPoint behavior.
When extracted on the sacre_coeur data (with no maximum points) my points seems weird :

When extracted with 2048 max points the points are looking a bit better but still ...

This weird behavior is then leeding me to no matches between the two images ...

Does anybody ever encountered the same issue with the descriptor or know how to solve this issue ?

Thanks in advance,
Best regards,

Hugo

rpautrat · 2024-11-08T10:06:49Z

Hi, are you sure that the SuperPoint is correctly initialized, i.e. with the right pre-trained weights loaded?

Do you get the same if you test it on other images?

leblond14u · 2024-11-08T10:18:10Z

Hi,
Thanks for your answer.
I got the same king of results for the easy DSC_0410 scenario :

My weights are loaded from the https://github.com/cvg/LightGlue/releases/download/v0.1_arxiv/superpoint_v1.pth repo.
With still no matches to be found ...

GoroYeh-HRI · 2024-11-12T01:21:43Z

Hi,
I'm running the match_features_demo.py using the pretrained model sp_v6.
As instructed here:
https://github.com/rpautrat/SuperPoint?tab=readme-ov-file#matching-features-demo-with-pretrained-weights

However, I got the second descriptor extracted from SuperPoint model with dimension (0, 256)
which led to error and cannot be matched.

This is the pair of images being loaded.
They are hpatches-sequences-release/i_pool/1.ppm and 6.ppm.

Any idea why I'd get zero dimension descriptor from the darker image?
Thanks.

leblond14u · 2024-11-15T10:36:05Z

Update :
@rpautrat It seems my problem is specific to the inference on cuda devices.
When running the superpoints on my cpu I get no issues at all to find good matches on the sacre_coeur scenario.
My setup is :

Nvidia A5000
CUDA Version: 12.2
torch 2.1.2
torchvision 0.16.2

@GoroYeh-HRI Is your problem also related to gpu usage ?

Best,

Hugo

GoroYeh-HRI · 2024-11-15T20:48:49Z

@leblond14u Thanks for asking.
My GPU setup is:
NVIDIA RTX A5000
CUDA Driver Version: 12.5
nvcc -V: cuda 11.6
torch 2.0.1
torchvision 0.15.2

Not sure if this has something to do with the "no descriptor error" I met.
@rpautrat do you have any idea?

rpautrat · 2024-11-18T09:27:51Z

@leblond14u, I tested the torch SuperPoint model with recent versions of CUDA/Torch (CUDA 12.6 and Torch 2.4.1), and the detections look normal for me. So I am not sure where your problem is coming from... All I can suggest is to try another set of CUDA/Torch versions and see if this helps to resolve the problem.

@GoroYeh-HRI, which tensorflow version are you using? This repo is using an old version (e.g. 1.12 recommended).

GoroYeh-HRI · 2024-11-18T18:07:11Z

@leblond14u, I tested the torch SuperPoint model with recent versions of CUDA/Torch (CUDA 12.6 and Torch 2.4.1), and the detections look normal for me. So I am not sure where your problem is coming from... All I can suggest is to try another set of CUDA/Torch versions and see if this helps to resolve the problem.

@GoroYeh-HRI, which tensorflow version are you using? This repo is using an old version (e.g. 1.12 recommended).

Thanks for the prompt reply!
When I used 1.12 tensorflow, I got issue when training the MagicPoint.
The issue is: I got loss=nan, precision=nan, recall=0.0.
I read through the Github issues and still could not resolve this issue.
That's why I assume this is due to the incompatibility between the tensorflow version and my CUDA driver version (12.5)

rpautrat · 2024-11-19T07:47:56Z

Yes, this is very much possible. Unfortunately, this repo is getting old and is only compatible with older versions of CUDA probably. Can you try with an earlier version?

2896963297 · 2024-12-04T07:31:51Z

，我使用最新版本的 CUDA/Torch（CUDA 12.6 和 Torch 2.4.1）测试了 torch SuperPoint 模型，检测结果对我来说看起来很正常。所以我不确定你的问题从何而来......我只能建议尝试另一组 CUDA/Torch 版本，看看这是否有助于解决问题。
，您使用的是哪个 TensorFlow 版本？此存储库使用的是旧版本（例如推荐 1.12）。

感谢您的及时回复！当我使用 1.12 tensorflow 时，我在训练 MagicPoint 时遇到了问题。问题是：我得到了 loss=nan，precision=nan，recall=0.0。我通读了 Github 问题，但仍然无法解决此问题。这就是为什么我认为这是由于 tensorflow 版本和我的 CUDA 驱动程序版本（ 12.5 ）不兼容的原因

你可以试试nvidia-tensorflow，我之前遇到就是用的这个解决的

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird reproductibility error at inference #332

Weird reproductibility error at inference #332

leblond14u commented Nov 8, 2024

rpautrat commented Nov 8, 2024

leblond14u commented Nov 8, 2024 •

edited

Loading

GoroYeh-HRI commented Nov 12, 2024

leblond14u commented Nov 15, 2024 •

edited

Loading

GoroYeh-HRI commented Nov 15, 2024

rpautrat commented Nov 18, 2024

GoroYeh-HRI commented Nov 18, 2024

rpautrat commented Nov 19, 2024 •

edited

Loading

2896963297 commented Dec 4, 2024

Weird reproductibility error at inference #332

Weird reproductibility error at inference #332

Comments

leblond14u commented Nov 8, 2024

rpautrat commented Nov 8, 2024

leblond14u commented Nov 8, 2024 • edited Loading

GoroYeh-HRI commented Nov 12, 2024

leblond14u commented Nov 15, 2024 • edited Loading

GoroYeh-HRI commented Nov 15, 2024

rpautrat commented Nov 18, 2024

GoroYeh-HRI commented Nov 18, 2024

rpautrat commented Nov 19, 2024 • edited Loading

2896963297 commented Dec 4, 2024

leblond14u commented Nov 8, 2024 •

edited

Loading

leblond14u commented Nov 15, 2024 •

edited

Loading

rpautrat commented Nov 19, 2024 •

edited

Loading