CUDNN_STATUS_NOT_INITIALIZED #16

sak96 · 2022-12-04T14:19:12Z

Building the autoencoder.
Building the unet.
Timestep 0/30
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Torch("cuDNN error: CUDNN_STATUS_NOT_INITIALIZED\nException raised from createCuDNNHandle at /build/python-pytorch/src/pytorch-1.13.0-cuda/aten/src/ATen/cudnn/Handle.cpp:9 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x92 (0x7fdb34705bf2 in /usr/lib/libc10.so)\nframe #1: <unknown function> + 0xd89413 (0x7fdaeab89413 in /usr/lib/libtorch_cuda.so)\nframe #2: at::native::getCudnnHandle() + 0x7b8 (0x7fdaeaeded18 in /usr/lib/libtorch_cuda.so)\nframe #3: <unknown function> + 0x1055f89 (0x7fdaeae55f89 in /usr/lib/libtorch_cuda.so)\nframe #4: <unknown function> + 0x10505f4 (0x7fdaeae505f4 in /usr/lib/libtorch_cuda.so)\nframe #5: at::native::cudnn_convolution(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, bool) + 0xad (0x7fdaeae50a2d in /usr/lib/libtorch_cuda.so)\nframe #6: <unknown function> + 0x3882bf4 (0x7fdaed682bf4 in /usr/lib/libtorch_cuda.so)\nframe #7: <unknown function> + 0x3882cad (0x7fdaed682cad in /usr/lib/libtorch_cuda.so)\nframe #8: at::_ops::cudnn_convolution::call(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, bool) + 0x226 (0x7fdae0416e46 in /usr/lib/libtorch_cpu.so)\nframe #9: at::native::_convolution(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long, bool, bool, bool, bool) + 0x1097 (0x7fdadf821ff7 in /usr/lib/libtorch_cpu.so)\nframe #10: <unknown function> + 0x258518e (0x7fdae078518e in /usr/lib/libtorch_cpu.so)\nframe #11: at::_ops::_convolution::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long, bool, bool, bool, bool) + 0x299 (0x7fdadff96759 in /usr/lib/libtorch_cpu.so)\nframe #12: at::native::convolution(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x111 (0x7fdadf814bd1 in /usr/lib/libtorch_cpu.so)\nframe #13: <unknown function> + 0x2584c3e (0x7fdae0784c3e in /usr/lib/libtorch_cpu.so)\nframe #14: at::_ops::convolution::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x15d (0x7fdadff43b3d in /usr/lib/libtorch_cpu.so)\nframe #15: <unknown function> + 0x43b0226 (0x7fdae25b0226 in /usr/lib/libtorch_cpu.so)\nframe #16: <unknown function> + 0x43b1060 (0x7fdae25b1060 in /usr/lib/libtorch_cpu.so)\nframe #17: at::_ops::convolution::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x247 (0x7fdadff95a27 in /usr/lib/libtorch_cpu.so)\nframe #18: at::native::conv2d(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long) + 0x20d (0x7fdadf81908d in /usr/lib/libtorch_cpu.so)\nframe #19: <unknown function> + 0x273d3f6 (0x7fdae093d3f6 in /usr/lib/libtorch_cpu.so)\nframe #20: at::_ops::conv2d::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long) + 0x202 (0x7fdae053fad2 in /usr/lib/libtorch_cpu.so)\nframe #21: <unknown function> + 0x2f359e (0x55ad31d2559e in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #22: <unknown function> + 0x2fe5da (0x55ad31d305da in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #23: <unknown function> + 0x2b79bd (0x55ad31ce99bd in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #24: <unknown function> + 0x2bd561 (0x55ad31cef561 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #25: <unknown function> + 0x2e1ad0 (0x55ad31d13ad0 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #26: <unknown function> + 0x2c05f1 (0x55ad31cf25f1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #27: <unknown function> + 0xd90f5 (0x55ad31b0b0f5 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #28: <unknown function> + 0x96438 (0x55ad31ac8438 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #29: <unknown function> + 0x975a1 (0x55ad31ac95a1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #30: <unknown function> + 0xb496b (0x55ad31ae696b in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #31: <unknown function> + 0xa10ae (0x55ad31ad30ae in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #32: <unknown function> + 0xacbf1 (0x55ad31adebf1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #33: <unknown function> + 0x621aee (0x55ad32053aee in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #34: <unknown function> + 0xacbc0 (0x55ad31adebc0 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #35: <unknown function> + 0x9b83c (0x55ad31acd83c in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #36: <unknown function> + 0x23290 (0x7fdade03c290 in /usr/lib/libc.so.6)\nframe #37: __libc_start_main + 0x8a (0x7fdade03c34a in /usr/lib/libc.so.6)\nframe #38: <unknown function> + 0x91905 (0x55ad31ac3905 in $HOME.cargo-target/debug/examples/stable-diffusion)\n")', $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/wrappers/tensor_generated.rs:6457:72
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
   2: core::result::unwrap_failed
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1785:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1078:23
   4: tch::wrappers::tensor_generated::<impl tch::wrappers::tensor::Tensor>::conv2d
             at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/wrappers/tensor_generated.rs:6457:9
   5: <tch::nn::conv::Conv<[i64; 2]> as tch::nn::module::Module>::forward
             at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/nn/conv.rs:216:9
   6: tch::nn::module::<impl tch::wrappers::tensor::Tensor>::apply
             at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/nn/module.rs:47:9
   7: diffusers::models::unet_2d::UNet2DConditionModel::forward
             at ./src/models/unet_2d.rs:237:18
   8: stable_diffusion::run
             at ./examples/stable-diffusion/main.rs:167:30
   9: stable_diffusion::main
             at ./examples/stable-diffusion/main.rs:200:9
  10: core::ops::function::FnOnce::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Not sure if this is regarding the library or some other issue.

ajmwagar · 2022-12-07T23:29:31Z

Dumb question. Do you have a CUDA enabled GPU on your system?

sak96 · 2022-12-08T05:01:09Z

oh yeah i forgot to give details about the machine.

% lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
05:00.0 VGA compatible controller: Advanced Micro Devices ....

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   32C    P3    N/A /  N/A |      5MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       666      G   /usr/lib/Xorg                       4MiB |
+-----------------------------------------------------------------------------+

6Gb 3060

% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

using export TORCH_CUDA_VERSION=cu118.
would any other details be required.

i am using f16 model

% sha256sum unet16.bin
5019a4fbb455dd9b75192afc3ecf8a8ec875e83812fd51029d2e19277edddebc  unet16.bin

rockerBOO · 2023-01-01T23:36:32Z

Could try something like

println!("Cuda available: {}", tch::Cuda::is_available());
println!("Cudnn available: {}", tch::Cuda::cudnn_is_available());

To see if the tch library can see it.

sak96 · 2023-01-02T06:18:55Z

Cuda available: true
Cudnn available: true
Cuda available: true
Cudnn available: true

i am not sure why it printed stuff twice though.

--- a/examples/stable-diffusion/main.rs
+++ b/examples/stable-diffusion/main.rs
@@ -196,6 +196,8 @@ fn run(args: Args) -> anyhow::Result<()> {

 fn main() -> anyhow::Result<()> {
     let args = Args::parse();
+    println!("Cuda available: {}", tch::Cuda::is_available());
+    println!("Cudnn available: {}", tch::Cuda::cudnn_is_available());
     if !args.autocast {
         run(args)
     } else {

EDIT:
i found that the cuda code is also part of the code.

sak96 · 2023-01-10T09:06:37Z

some solution i found was: pytorch/pytorch#16831 (comment)

tch::Cuda::cudnn_set_benchmark(false);

this did not help.

there is another issue with same stuff: tensorflow/tensorflow#6698 (comment) or https://stackoverflow.com/a/52634209
but i dont see any api which can be used to do the same in tch-rs. if you have any idea please let me know.

eoriont · 2023-05-25T03:01:19Z

Do you have libtorch installed? I had this issue, then fixed it, then forgot exactly what fixed it 😑
The things I tried were: installing the cuda version of pytorch, installing cuda 11.8, installing cudnn 8.9.1, and installing libtorch. After libtorch, it just worked magically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN_STATUS_NOT_INITIALIZED #16

CUDNN_STATUS_NOT_INITIALIZED #16

sak96 commented Dec 4, 2022

ajmwagar commented Dec 7, 2022

sak96 commented Dec 8, 2022 •

edited

Loading

rockerBOO commented Jan 1, 2023

sak96 commented Jan 2, 2023 •

edited

Loading

sak96 commented Jan 10, 2023 •

edited

Loading

eoriont commented May 25, 2023

CUDNN_STATUS_NOT_INITIALIZED #16

CUDNN_STATUS_NOT_INITIALIZED #16

Comments

sak96 commented Dec 4, 2022

ajmwagar commented Dec 7, 2022

sak96 commented Dec 8, 2022 • edited Loading

rockerBOO commented Jan 1, 2023

sak96 commented Jan 2, 2023 • edited Loading

sak96 commented Jan 10, 2023 • edited Loading

eoriont commented May 25, 2023

sak96 commented Dec 8, 2022 •

edited

Loading

sak96 commented Jan 2, 2023 •

edited

Loading

sak96 commented Jan 10, 2023 •

edited

Loading