Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None of the algorithms provided by cuDNN heuristics worked; trying fallback algorithms #86

Open
shiyi099 opened this issue Mar 27, 2024 · 2 comments

Comments

@shiyi099
Copy link

shiyi099 commented Mar 27, 2024

My configure of Python package is
jax == 0.4.13
jaxlib == 0.4.13+cu12+cudnn89

My Hardware is
Nvidia-H800

On Ubuntu20.04 LTS x86_64

When I try to run inference of TAPNET or TAPIR models, it shows “None of the algorithms provided by cuDNN heuristics worked; trying fallback algorithms”, and the outputs seem wrong and they are different from those on Windows(RTX3060). What's wrong with it? How can I solve it?

@cdoersch
Copy link
Collaborator

This seems more like a JAX issue than an issue with TAPIR. I unfortunately don't have easy access to H800 GPUs and so I can't easily reproduce this issue. Your best bet might be to try to isolate the op which is producing different outputs on the two different devices, and then create a simple reproduction of the issue that you can use to file a bug against JAX or XLA.

@shiyi099
Copy link
Author

shiyi099 commented Mar 28, 2024

This seems more like a JAX issue than an issue with TAPIR. I unfortunately don't have easy access to H800 GPUs and so I can't easily reproduce this issue. Your best bet might be to try to isolate the op which is producing different outputs on the two different devices, and then create a simple reproduction of the issue that you can use to file a bug against JAX or XLA.

This seems more like a JAX issue than an issue with TAPIR. I unfortunately don't have easy access to H800 GPUs and so I can't easily reproduce this issue. Your best bet might be to try to isolate the op which is producing different outputs on the two different devices, and then create a simple reproduction of the issue that you can use to file a bug against JAX or XLA.

Thank you! It seems disabled whatever i do any configuration on jax. I tried to change my jax version into 0.4.x (cudnn88). Unfortunately, the same prompts are presented. jax-ml/jax#17523 meets the simliar problem. Now I have tried using pytorch models of TAPNET on H800 and modified some codes at the folder of tapnet/pytorch. It works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants