Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About ge-spmm's pytorch-custom #10

Open
GugaGugaGuga opened this issue Dec 2, 2021 · 12 comments
Open

About ge-spmm's pytorch-custom #10

GugaGugaGuga opened this issue Dec 2, 2021 · 12 comments

Comments

@GugaGugaGuga
Copy link

When I run "python3.8 gcn_custom_2layer.py --n-hidden=32", the following situation occurred:

Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja...
Building extension module spmm...
[1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o
[2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
[3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so
Loading extension module spmm...
Traceback (most recent call last):
  File "gcn_custom_2layer.py", line 9, in <module>
    from op import GCNConv
  File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module>
    spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load
    return _jit_compile(
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/usr/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2

Please help me how to run through next.

@hgyhungry
Copy link
Owner

It should work together with cudatoolkit v10.1. Try this inside your Python environment
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

@GugaGugaGuga
Copy link
Author

It should work together with cudatoolkit v10.1. Try this inside your Python environment conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

I don't have anaconda, I only have python3.8 under ubuntu, all the previous operations ran through and only this sentence is a problem. Can I have any other options?

@GugaGugaGuga
Copy link
Author

It should work together with cudatoolkit v10.1. Try this inside your Python environment conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

I don't have anaconda, I only have python3.8 under ubuntu, all the previous operations ran through and only this sentence is a problem. Can I have any other options?


wjy@wjy:~/Documents/ge-spmm-master/pytorch-custom$ python3.8
Python 3.8.12 (default, Sep 10 2021, 00:16:05) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.4.0
>>> torch.cuda.is_available()
True

Testing the successful installation, why did this happen when running the gcn_custom_2layer.py file?Please ask for help.

@hgyhungry
Copy link
Owner

Hi @GugaGugaGuga, the error occurs because cusparse in cuda11 and cuda10 have different APIs, so what matters is CUDA Toolkit version. Can you try print(torch.version.cuda) in python and see if the output is 10.x?

@GugaGugaGuga
Copy link
Author

It should work together with cudatoolkit v10.1. Try this inside your Python environment conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch

Hi @GugaGugaGuga, the error occurs because cusparse in cuda11 and cuda10 have different APIs, so what matters is CUDA Toolkit version. Can you try print(torch.version.cuda) in python and see if the output is 10.x?

Yes,the output is 10.1. I know cuda11 and cuda10 have different APIs But your version of the code uses cuda10.1.Is there any problem with cuda10.1?

@hgyhungry
Copy link
Owner

Sorry, the torch.version.cuda does not matter. The compilation of the shared library spmm.so is through your system's default cuda. If your default nvcc is >= 11, there would be a problem. First check if your system cuda is correct version through nvcc --version. To further rule out problems, can you share the output when you execute the script, in particular we need things like

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/guyue/anaconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/guyue/ge-spmm/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o

when the shared library is jit-compiled. In my case it is through /usr/local/cuda-10.1/bin/nvcc which works fine.

Note that you may need to clean the compilation cache and run again to see this logging, in your case you need to delete your folder /tmp/torch_extensions/spmm if it's still there.

@GugaGugaGuga
Copy link
Author

Sorry, the torch.version.cuda does not matter. The compilation of the shared library spmm.so is through your system's default cuda. If your default nvcc is >= 11, there would be a problem. First check if your system cuda is correct version through nvcc --version. To further rule out problems, can you share the output when you execute the script, in particular we need things like

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/guyue/anaconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/guyue/ge-spmm/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o

when the shared library is jit-compiled. In my case it is through /usr/local/cuda-10.1/bin/nvcc which works fine.

Note that you may need to clean the compilation cache and run again to see this logging, in your case you need to delete your folder /tmp/torch_extensions/spmm if it's still there.

wjy@wjy:~/Documents/ge-spmm-master/pytorch-custom$ python3.8 gcn_custom.py --n-hidden=32
Using /tmp/torch_extensions as PyTorch extensions root...
Creating extension directory /tmp/torch_extensions/spmm...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja...
Building extension module spmm...
[1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o
[2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
[3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so
Loading extension module spmm...
Traceback (most recent call last):
  File "gcn_custom.py", line 9, in <module>
    from op import GCNConv
  File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module>
    spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load
    return _jit_compile(
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/usr/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2I was

wjy@wjy:~/Downloads$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

Listening to your answer, I was deleted folder /tmp/torch_extensions/spmm, and is still the case.

@hgyhungry
Copy link
Owner

Sorry, the torch.version.cuda does not matter. The compilation of the shared library spmm.so is through your system's default cuda. If your default nvcc is >= 11, there would be a problem. First check if your system cuda is correct version through nvcc --version. To further rule out problems, can you share the output when you execute the script, in particular we need things like

Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/guyue/anaconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/guyue/anaconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/guyue/ge-spmm/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o

when the shared library is jit-compiled. In my case it is through /usr/local/cuda-10.1/bin/nvcc which works fine.
Note that you may need to clean the compilation cache and run again to see this logging, in your case you need to delete your folder /tmp/torch_extensions/spmm if it's still there.

wjy@wjy:~/Documents/ge-spmm-master/pytorch-custom$ python3.8 gcn_custom.py --n-hidden=32
Using /tmp/torch_extensions as PyTorch extensions root...
Creating extension directory /tmp/torch_extensions/spmm...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja...
Building extension module spmm...
[1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o
[2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
[3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so
Loading extension module spmm...
Traceback (most recent call last):
  File "gcn_custom.py", line 9, in <module>
    from op import GCNConv
  File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module>
    spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load
    return _jit_compile(
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/usr/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2I was

wjy@wjy:~/Downloads$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

Listening to your answer, I was deleted folder /tmp/torch_extensions/spmm, and is still the case.

I cannot reproduce the error... Is your LD_LIBRARY_PATH including your /usr/local/cuda-10.1/lib64 ?

@GugaGugaGuga
Copy link
Author

export PATH="/usr/local/cuda-10.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH"
export CUDA_HOME="/usr/local/cuda-10.1"

Yes

@hgyhungry
Copy link
Owner

export PATH="/usr/local/cuda-10.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH"
export CUDA_HOME="/usr/local/cuda-10.1"

Yes

Since I cannot reproduce the environment problem, I suggest you use a docker image that I test fine. This is the easiest way. The image pytorch/pytorch:1.4-cuda10.1-cudnn7-devel will work for this repo.

@GugaGugaGuga
Copy link
Author

When I run "python3.8 gcn_custom_2layer.py --n-hidden=32", the following situation occurred:

Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja...
Building extension module spmm...
[1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o
[2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
[3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so
Loading extension module spmm...
Traceback (most recent call last):
  File "gcn_custom_2layer.py", line 9, in <module>
    from op import GCNConv
  File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module>
    spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load
    return _jit_compile(
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/usr/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2

Please help me how to run through next.

Do I need to download cudnn, and if so, what version?

@hgyhungry
Copy link
Owner

When I run "python3.8 gcn_custom_2layer.py --n-hidden=32", the following situation occurred:

Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/spmm/build.ninja...
Building extension module spmm...
[1/3] c++ -MMD -MF spmm.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm.cpp -o spmm.o
[2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/wjy/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++11 -c /home/wjy/Documents/ge-spmm-master/pytorch-custom/spmm_kernel.cu -o spmm_kernel.cuda.o
[3/3] c++ spmm.o spmm_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o spmm.so
Loading extension module spmm...
Traceback (most recent call last):
  File "gcn_custom_2layer.py", line 9, in <module>
    from op import GCNConv
  File "/home/wjy/Documents/ge-spmm-master/pytorch-custom/op.py", line 6, in <module>
    spmm = load(name='spmm', sources=['spmm.cpp', 'spmm_kernel.cu'], verbose=True)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 670, in load
    return _jit_compile(
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 877, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/wjy/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1088, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/usr/lib/python3.8/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.8/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: /tmp/torch_extensions/spmm/spmm.so: undefined symbol: cusparseCsr2cscEx2

Please help me how to run through next.

Do I need to download cudnn, and if so, what version?

The code does not depend on cudnn, only cusparse, which comes together with cuda toolkit (we need version <= 10.1). Again I suggest using docker to solve environment problem, and pytorch/pytorch:1.4-cuda10.1-cudnn7-devel image should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants