v1.8.0 raises Exception if cudnn not found in Program Files #7965

iperov · 2021-06-05T13:06:49Z

v1.8.0 raises Exception if cudnn not found in C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.2\\bin

but my app is standalone and made for end-users who will not install CUDA/CUDNN sdk.

os LoadLibrary() automatically uses CUDA dlls provided in PATH environment

all works fine in v1.7.0 , can you fix it?

The text was updated successfully, but these errors were encountered:

snnn · 2021-06-07T02:33:22Z

When possible , don't use "PATH" for locating dependent DLLs.

If you put these CUDA DLL in the same dir of your application exe, it should be fine.

"for end-users who will not install CUDA/CUDNN sdk", why do you provide the onnx runtime GPU build to them? Please tell us more about your usage. Is it a C/C++ program or python?

iperov · 2021-06-07T05:53:15Z

If you put these CUDA DLL in the same dir of your application exe, it should be fine.

I know. My CUDA dlls located near in project dir.

But 1.8.0 raises a hard exception if cudnn not found in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin

onnxruntime/onnxruntime/python/_pybind_state.py

Lines 42 to 45 in 63df683

    
           if not os.path.isfile(os.path.join(cudnn_bin_dir, f"cudnn64_{version_info.cudnn_version}.dll")): 
        
               raise ImportError(f"cuDNN {version_info.cudnn_version} not installed in {cudnn_bin_dir}. " 
        
                                 f"Set the CUDNN_HOME environment variable to the path of the 'cuda' directory " 
        
                                 f"in your CUDNN installation if necessary.")

there was no such code in 1.7.0 and works fine

oliviajain · 2021-06-07T16:48:17Z

Cudnn documentation asks to copy cudnn files into the CUDA Toolkit directory located in Program Files. Maybe @skottmckay can give more context.

snnn · 2021-06-07T17:27:29Z

Now I got it. 1.8 assumes the CUDNN files are either located in the CUDA dir or %CUDNN_HOME% (which is a onnx runtime specific env variable). But 1.7 doesn't have such requirement, as long as the DLLs are in %PATH%, it is fine. So this is a breaking change.

iperov · 2021-06-07T18:38:31Z

Cudnn documentation asks to copy cudnn files into the CUDA Toolkit directory located in Program Files. Maybe @skottmckay can give more context.

that is for developers.

I am making an app for END-users.
They will use a stand-alone / portable app, which includes all necessary dependencies and libraries inside.

Requirement to install CUDA/CUDNN manually by an end-user is suicide.

Wake up developers ! What the hell are you doing ??

jywu-msft · 2021-06-07T19:06:11Z

Cudnn documentation asks to copy cudnn files into the CUDA Toolkit directory located in Program Files. Maybe @skottmckay can give more context.

that is for developers.

I am making an app for END-users.
They will use a stand-alone / portable app, which includes all necessary dependencies and libraries inside.

Requirement to install CUDA/CUDNN manually by an end-user is suicide.

Wake up developers ! What the hell are you doing ??

I believe the change was done to address a new restrictions for secure python dll loading on Windows:
https://bugs.python.org/issue36085
https://docs.python.org/3/whatsnew/3.8.html#bpo-36085-whatsnew
https://docs.python.org/3/library/os.html#os.add_dll_directory
Toblerity/Fiona#851

This is currently only needed for python 3.8 and above.
So one option could be to move that check to

onnxruntime/onnxruntime/python/_pybind_state.py

Line 48 in 63df683

    
           # Python 3.8 (and later) doesn't search system PATH when loading DLLs, so the CUDA location needs to be

?

However, this change will eventually be required for all users as they update their python version on Windows,
so I suspect that is why it is consistently enforced across all versions.

skottmckay · 2021-06-07T21:41:39Z

@ivanst0 added the bulk of the new behavior in #6436 however that PR seemed to be more about handling multiple CUDA versions being on a machine.

I included the additional CUDNN_HOME check to be consistent with what the ORT build uses for an explicitly specified path to the CUDNN libraries (and part of the build uses the python bindings for tests). Previously the CUDNN documentation involved putting the binaries in a separate location to CUDA_HOME, but now that that has changed we could remove the usage of CUDNN_HOME from the build etc. That seems like a side issue though.

Is it that there's a requirement for a user to install CUDA/CUDNN, or that os.environ needs to have an entry saying where to find the CUDA dlls? If it's the latter, short term could that entry be added prior to importing the onnxruntime python module, pointing to wherever the CUDA dlls you want to be loaded are?

Long term, would it be valid to not fail if the CUDA environment variables aren't found (that doesn't mean the dlls aren't available), and do a check via ctypes.util.find_library instead after any calls to os.add_dll_directory (if any) are made? i.e. add the CUDA paths we look for using add_dll_directory if found, but also allow for a user having added path information.

iperov · 2021-06-08T05:26:54Z

cuda/cudnn bin path via os.environ is fine for me.

ivanst0 · 2021-06-08T06:03:55Z

Yes, if you are distributing CUDA/cuDNN DLLs with your Python app/package (in <LIB_DIR>\bin), I recommend setting the appropriate environment variable (e.g. CUDA_PATH_V11_2) to <LIB_DIR> instead of prepending <LIB_DIR>\bin to PATH, before importing onnxruntime package. This works across all supported Python versions (3.6 - 3.9).

@iperov, if this solution works for you please feel free to close this issue.

skottmckay · 2021-06-08T11:24:03Z

@ivanst0 Is there a reason why we need to force someone to set the environment variable if the CUDA library would have been found?

i.e. what would the issue be with us making the calls to add_dll_directory if paths are available via the environment variable/s, but only failing if ctypes.util.find_library can't find the required cuda libraries?

If possible that seems slightly cleaner and more user friendly to me as the user doesn't need to discover the correct incantation for the CUDA_PATH_V... environment variable name (given it's based on the CUDA version ORT was built with).

iperov · 2021-06-08T11:53:09Z

Agree.

Also I am using onnxruntime with pytorch(latest version) which contains cuda libraries in site-packages so 1.7.0 onnxruntime use them automatically because they are accessible through PATH

snnn · 2021-06-08T17:27:09Z

onnxruntime use them automatically because they are accessible through PATH

It just happened to work. onnx runtime and pytorch requires different CUDA and CUDNN versions. Even though sometimes the file names are the same, the versions are different.

iperov · 2021-06-08T17:33:27Z

why different if I choosed torch==1.8.1+cu111
onnxruntime 1.7.0 uses 11.0
onnxruntime 1.8.0 uses 11.1
and CUDA provides minor version backward compatibility.

snnn · 2021-06-08T17:35:41Z

onnxruntime 1.8.0 uses 11.0

CUDA provides minor version backward compatibility starting from 11.1.

And what about CUDNN version?

And what if onnxruntime was built with a newer CUDA version than pytorch?

iperov · 2021-06-08T17:37:10Z

please don't discuss offtopic

jywu-msft · 2021-06-22T20:42:39Z

we plan on patching 1.8 release to fix this issue.

iperov · 2021-07-13T14:53:28Z

seems like someone removed code from pybind_state and now works as 1.7.0

what solution about securing dlls in future?

iperov · 2022-08-21T09:39:12Z

looks like onnxruntime-gpu==1.12.1 does not work with cuda 11.5+
error is
Please make sure cudnn_cnn_infer64_8.dll is in your library path!

cuda 11.3 is fine.

@jywu-msft Can you write the cuda version requirements on the release page?

jywu-msft · 2022-08-22T15:57:08Z

looks like onnxruntime-gpu==1.12.1 does not work with cuda 11.5+ error is Please make sure cudnn_cnn_infer64_8.dll is in your library path!

cuda 11.3 is fine.

@jywu-msft Can you write the cuda version requirements on the release page?

11.5 should work. I just tested it with onnxruntime-gpu 1.12.1 python package and it worked fine.
cudnn_cnn_infer64_8.dll is part of cuda 11.5 installation.
any other details about your environment? where are the CUDA 11.5 libs? what's in your PATH?

iperov · 2022-08-22T16:26:19Z

I don't use cuda "installation". Cuda is not an installation, but only bunch of dlls.

i am using cuda dlls from torch pip
torch==1.10.0+cu113 torchvision==0.11.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
works fine with 1.12.1

but dlls from
python -m pip install torch==1.11.0+cu115 torchvision==0.12.0+cu115 -f https://download.pytorch.org/whl/torch_stable.html
does not work

jywu-msft · 2022-08-22T17:26:11Z

I don't use cuda "installation". Cuda is not an installation, but only bunch of dlls.

i am using cuda dlls from torch pip torch==1.10.0+cu113 torchvision==0.11.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html works fine with 1.12.1

but dlls from python -m pip install torch==1.11.0+cu115 torchvision==0.12.0+cu115 -f https://download.pytorch.org/whl/torch_stable.html does not work

tested torch==1.11.0+cu115 and that worked too.
I added the location of the cuda 11.5 libs to my PATH (in my environment, it's c:\Program Files (x86)\Microsoft Visual Studio\Shared\Python39_64\Lib\site-packages\torch\lib) and it worked.

iperov · 2022-08-22T18:01:42Z

Because it uses dlls from your already installed "cuda kit" or other dirs from PATH.

I am using builder of portable all-in-one folder for DeepFaceLive project (written by me) (https://github.com/iperov/DeepFaceLive), where PATHs are limited by folder.
It has CUDA bin directory with dlls from torch==1.11.0+cu115 and it does not work Could not load library cudnn_cnn_infer64_8.dll. Error code 126
but cu113 works. Thus I cannot upgrade the project to cu115 due to this issue.
I can send you this folder for test.

jywu-msft · 2022-08-22T21:42:48Z

Because it uses dlls from your already installed "cuda kit" or other dirs from PATH.

I am using builder of portable all-in-one folder for DeepFaceLive project (written by me) (https://github.com/iperov/DeepFaceLive), where PATHs are limited by folder. It has CUDA bin directory with dlls from torch==1.11.0+cu115 and it does not work Could not load library cudnn_cnn_infer64_8.dll. Error code 126 but cu113 works. Thus I cannot upgrade the project to cu115 due to this issue. I can send you this folder for test.

that error message is saying it could not load cudnn_cnn_infer64_8.dll , but it could also mean one of its dependencies is missing. I suspect that is most likely the cause.
I tested against the cuda lib location installed by torch (and removed all references in PATH to any system nvidia toolkit locations) and it worked.
try the suggestions in #6435 to see if they can help. (e.g. try running dependency walker on cudnn_cnn_infer64_8.dll)

iperov · 2022-08-24T17:03:09Z

ok I will check.

There is other issue .

same model produces different result with onnxruntime-gpu==1.12.1 and onnxruntime-gpu==1.11.0. New version produces buggy inference.

Every new release you introduce new bugs !! so tired.

jywu-msft · 2022-08-24T17:12:37Z

ok I will check.

There is other issue .

same model produces different result with onnxruntime-gpu==1.12.1 and onnxruntime-gpu==1.11.0. New version produces buggy inference.

Every new release you introduce new bugs !! so tired.

I understand your frustration about bugs. Unfortunately, bugs will come with new features and changes. We will try our best to do better testing and avoid regressions. You can help us by opting into our Release Candidate testing. Every release we will have a couple weeks period where we publish Release Candidates and users can report issues before we finalize the release.
e.g. see #12133
Thank you!

Can you please file a separate issue with repro steps and assets for the "buggy inference" issue? Otherwise it is getting buried in this closed issue and others aren't looking at it.
It's difficult to say whether there is an ORT regression at this point. (I think you also updated CUDA version, right?)
Which execution provider do you use? CUDAExecutionProvider? Does the issue occur with CPUExecutionProvider ?

iperov · 2022-08-24T17:54:46Z

check issue 12706

oliviajain assigned oliviajain and skottmckay and unassigned oliviajain Jun 7, 2021

oliviajain added the type:support label Jun 7, 2021

iperov closed this as completed Jun 9, 2021

pranavsharma mentioned this issue Jun 21, 2021

Fix Python Cuda loading issues #7939

Merged

jywu-msft reopened this Jun 22, 2021

iperov closed this as completed Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.8.0 raises Exception if cudnn not found in Program Files #7965

v1.8.0 raises Exception if cudnn not found in Program Files #7965

iperov commented Jun 5, 2021

snnn commented Jun 7, 2021 •

edited

Loading

iperov commented Jun 7, 2021

oliviajain commented Jun 7, 2021 •

edited

Loading

snnn commented Jun 7, 2021

iperov commented Jun 7, 2021

jywu-msft commented Jun 7, 2021

skottmckay commented Jun 7, 2021

iperov commented Jun 8, 2021

ivanst0 commented Jun 8, 2021

skottmckay commented Jun 8, 2021

iperov commented Jun 8, 2021

snnn commented Jun 8, 2021

iperov commented Jun 8, 2021

snnn commented Jun 8, 2021 •

edited

Loading

iperov commented Jun 8, 2021

jywu-msft commented Jun 22, 2021

iperov commented Jul 13, 2021

iperov commented Aug 21, 2022

jywu-msft commented Aug 22, 2022

iperov commented Aug 22, 2022

jywu-msft commented Aug 22, 2022

iperov commented Aug 22, 2022

jywu-msft commented Aug 22, 2022

iperov commented Aug 24, 2022

jywu-msft commented Aug 24, 2022 •

edited

Loading

iperov commented Aug 24, 2022

v1.8.0 raises Exception if cudnn not found in Program Files #7965

v1.8.0 raises Exception if cudnn not found in Program Files #7965

Comments

iperov commented Jun 5, 2021

snnn commented Jun 7, 2021 • edited Loading

iperov commented Jun 7, 2021

oliviajain commented Jun 7, 2021 • edited Loading

snnn commented Jun 7, 2021

iperov commented Jun 7, 2021

jywu-msft commented Jun 7, 2021

skottmckay commented Jun 7, 2021

iperov commented Jun 8, 2021

ivanst0 commented Jun 8, 2021

skottmckay commented Jun 8, 2021

iperov commented Jun 8, 2021

snnn commented Jun 8, 2021

iperov commented Jun 8, 2021

snnn commented Jun 8, 2021 • edited Loading

iperov commented Jun 8, 2021

jywu-msft commented Jun 22, 2021

iperov commented Jul 13, 2021

iperov commented Aug 21, 2022

jywu-msft commented Aug 22, 2022

iperov commented Aug 22, 2022

jywu-msft commented Aug 22, 2022

iperov commented Aug 22, 2022

jywu-msft commented Aug 22, 2022

iperov commented Aug 24, 2022

jywu-msft commented Aug 24, 2022 • edited Loading

iperov commented Aug 24, 2022

snnn commented Jun 7, 2021 •

edited

Loading

oliviajain commented Jun 7, 2021 •

edited

Loading

snnn commented Jun 8, 2021 •

edited

Loading

jywu-msft commented Aug 24, 2022 •

edited

Loading