Not work for CUDA Version: 12.4 #1152

Jebati · 2024-03-27T13:19:30Z

System Info

kohya-ss-gui  | ================================================================================
kohya-ss-gui  | The following directories listed in your path were found to be non-existent: {PosixPath('/home/1000/.local/lib/python3.10/site-packages/cv2/../../lib64')}
kohya-ss-gui  | The following directories listed in your path were found to be non-existent: {PosixPath('//github.com/pypa/get-pip/raw/dbf0c85f76fb6e1ab42aa672ffca6f0a675d9ee4/public/get-pip.py'), PosixPath('https')}
kohya-ss-gui  | CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
kohya-ss-gui  | The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
kohya-ss-gui  | DEBUG: Possible options found for libcudart.so: set()
kohya-ss-gui  | CUDA SETUP: PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: 8.9.
kohya-ss-gui  | CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
kohya-ss-gui  | CUDA SETUP: Loading binary /home/1000/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
kohya-ss-gui  | libcusparse.so.12: cannot open shared object file: No such file or directory
kohya-ss-gui  | CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
kohya-ss-gui  | CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
kohya-ss-gui  | CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
kohya-ss-gui  | CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
kohya-ss-gui  | CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
kohya-ss-gui  | CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
kohya-ss-gui  | CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh
kohya-ss-gui  | CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
kohya-ss-gui  | CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
kohya-ss-gui  |     train(args)
kohya-ss-gui  |   File "/app/sd-scripts/train_db.py", line 180, in train
kohya-ss-gui  |     _, _, optimizer = train_util.get_optimizer(args, trainable_params)
kohya-ss-gui  |   File "/app/sd-scripts/library/train_util.py", line 3616, in get_optimizer
kohya-ss-gui  |     import bitsandbytes as bnb
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
kohya-ss-gui  |     from . import cuda_setup, research, utils
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 2, in <module>
kohya-ss-gui  |     from .autograd._functions import (
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/bitsandbytes/research/autograd/_functions.py", line 8, in <module>
kohya-ss-gui  |     from bitsandbytes.autograd._functions import GlobalOutlierPooler, MatmulLtState
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/bitsandbytes/autograd/__init__.py", line 1, in <module>
kohya-ss-gui  |     from ._functions import get_inverse_transform_indices, undo_layout
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 10, in <module>
kohya-ss-gui  |     import bitsandbytes.functional as F
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 17, in <module>
kohya-ss-gui  |     from .cextension import COMPILED_WITH_CUDA, lib
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 17, in <module>
kohya-ss-gui  |     raise RuntimeError('''
kohya-ss-gui  | RuntimeError: 
kohya-ss-gui  |         CUDA Setup failed despite GPU being available. Please run the following command to get more information:
kohya-ss-gui  | 
kohya-ss-gui  |         python -m bitsandbytes
kohya-ss-gui  | 
kohya-ss-gui  |         Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
kohya-ss-gui  |         to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
kohya-ss-gui  |         and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
kohya-ss-gui  | Traceback (most recent call last):
kohya-ss-gui  |   File "/home/1000/.local/bin/accelerate", line 8, in <module>
kohya-ss-gui  |     sys.exit(main())
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
kohya-ss-gui  |     args.func(args)
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
kohya-ss-gui  |     simple_launcher(args)
kohya-ss-gui  |   File "/home/1000/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
kohya-ss-gui  |     raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
kohya-ss-gui  | subprocess.CalledProcessError: Command '['/usr/local/bin/python', '/app/sd-scripts/train_db.py', '--bucket_no_upscale', '--bucket_reso_steps=64', '--cache_latents', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--learning_rate=1e-05', '--learning_rate_te=1e-05', '--lr_scheduler=cosine', '--lr_scheduler_num_cycles=1', '--max_data_loader_n_workers=0', '--resolution=512,512', '--max_train_steps=5', '--mixed_precision=fp16', '--optimizer_type=AdamW8bit', '--output_name=last', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--save_every_n_epochs=1', '--save_model_as=safetensors', '--save_precision=fp16', '--train_batch_size=1', '--train_data_dir=/dataset/images', '--xformers']' returned non-zero exit status 1.
exit```

```# nvidia-smi 
Wed Mar 27 13:17:47 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:05:00.0 Off |                  Off |
| 31%   48C    P2             66W /  450W |    2745MiB /  24564MiB |     14%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Reproduction

Always

Expected behavior

Work

The text was updated successfully, but these errors were encountered:

Titus-von-Koeller · 2024-03-27T15:14:58Z

Yes, we're aware of this. We'll start supporting the latest CUDA version with the last release. The Docker image wasn't out yet, the last time we checked, so it wasn't straight forward to support in our CI setup so far.

Please compile from source for now, then everything should work perfectly fine.

CC @matthewdouglas

matthewdouglas · 2024-03-27T16:25:54Z

@Jebati Can you please try these steps from the output?

CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a

I would suspect you might find libcudart.so in /home/1000/.local/lib/python3.10/site-packages/nvidia/cuda_runtime/lib. Additionally you might find libcublas.so and libcublasLt.so in /home/1000/.local/lib/python3.10/site-packages/nvidia/cublas/lib, and libcusparse.so in /home/1000/.local/lib/python3.10/site-packages/nvidia/cusparse/lib. These paths may need to be added to LD_LIBRARY_PATH in order for everything to work correctly.

Note for myself: relates to #1126

Titus-von-Koeller · 2024-03-27T16:45:25Z

@matthewdouglas was just explaining to me that the key line is

kohya-ss-gui  | CUDA SETUP: PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: 8.9.

and I agree with his assertions:

I wouldn't pay too much attention to what CUDA version says in nvidia-smi outputs unless it's really old. The CUDA version there is just what max CUDA version their driver supports, but isn't always going to match what CUDA toolkit is installed or the one PyTorch is built with.

That means it will try to load libbitsandbytes_cuda121.so
The rest is just noise saying the CUDA libraries aren't anywhere on LD_LIBRARY_PATH nor could they be found on the system at all (but in reality, they are, pytorch shipped with them)

I think these comments of @matthewdouglas give valuable context to understand what's going on. Seems to me that he is spot on. Thanks for the valuable input!

Please follow the instructions outlined by him and report back to us.

Jebati · 2024-03-27T17:17:24Z

It seems to have worked!

Thanks!

dsidorenkoSU · 2024-04-01T08:49:46Z

I am getting this while compiling with CUDA 12.4

(base) daemon4d_us@instance-20240401-032345:~/bitsandbytes$ make
[ 14%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/common.cpp.o
[ 28%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/cpu_ops.cpp.o
[ 42%] Building CXX object CMakeFiles/bitsandbytes.dir/csrc/pythonInterface.cpp.o
[ 57%] Building CUDA object CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o
nvcc fatal   : Unsupported gpu architecture 'compute_35'
make[2]: *** [CMakeFiles/bitsandbytes.dir/build.make:118: CMakeFiles/bitsandbytes.dir/csrc/ops.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/bitsandbytes.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
(base) daemon4d_us@instance-20240401-032345:~/bitsandbytes$

Here is my GPU info:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   61C    P0             30W /   70W |       0MiB /  15360MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

matthewdouglas · 2024-04-01T12:53:34Z

Hi @dsidorenkoSU,
It looks like you're trying to build with support for Kepler GPUs, which is removed in CUDA 12. When configuring CMake, set -DCOMPUTE_CAPABILITY=75 to target just your T4.

dsidorenkoSU · 2024-04-01T22:53:01Z

@matthewdouglas This works. I appreciate your help.

Titus-von-Koeller closed this as completed Apr 2, 2024

zhuzilin mentioned this issue Sep 14, 2024

only import bitsandbytes when necessary OpenRLHF/OpenRLHF#438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not work for CUDA Version: 12.4 #1152

Not work for CUDA Version: 12.4 #1152

Jebati commented Mar 27, 2024 •

edited

Loading

Titus-von-Koeller commented Mar 27, 2024

matthewdouglas commented Mar 27, 2024

Titus-von-Koeller commented Mar 27, 2024

Jebati commented Mar 27, 2024 •

edited

Loading

dsidorenkoSU commented Apr 1, 2024 •

edited

Loading

matthewdouglas commented Apr 1, 2024

dsidorenkoSU commented Apr 1, 2024

Not work for CUDA Version: 12.4 #1152

Not work for CUDA Version: 12.4 #1152

Comments

Jebati commented Mar 27, 2024 • edited Loading

System Info

Reproduction

Expected behavior

Titus-von-Koeller commented Mar 27, 2024

matthewdouglas commented Mar 27, 2024

Titus-von-Koeller commented Mar 27, 2024

Jebati commented Mar 27, 2024 • edited Loading

dsidorenkoSU commented Apr 1, 2024 • edited Loading

matthewdouglas commented Apr 1, 2024

dsidorenkoSU commented Apr 1, 2024

Jebati commented Mar 27, 2024 •

edited

Loading

Jebati commented Mar 27, 2024 •

edited

Loading

dsidorenkoSU commented Apr 1, 2024 •

edited

Loading