Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Installationのメモ #861

Open
kaigai opened this issue Dec 11, 2024 · 3 comments
Open

CUDA Installationのメモ #861

kaigai opened this issue Dec 11, 2024 · 3 comments

Comments

@kaigai
Copy link
Contributor

kaigai commented Dec 11, 2024

最近、CUDAをインストールする時のドライバの選定とか、kernelの巻き込みアップデートとかで痛い目に遭ったので、一度どこかでドキュメントにまとめておきたい。

@ytooyama
Copy link
Contributor

ytooyama commented Dec 25, 2024

あくまでアイデアですが

CUDAのローカルパッケージを使えば、そのローカルリポジトリーの中にGPU Driverも入っているので互換性の問題は起きずに済むのではと思いました。RHELだけでなく他のディストリビューションも同様。一緒にdkmsパッケージを明示的にインストールするようにしておけば、CUDAに関してのみdkms autobuildが発動するので、Linux kernelを更新した時もなんとかなりますし。mofedは再インストールですけどね。

[cloud-user@kujira ~]$ rpm -ql cuda-repo-rhel9-12-6-local-12.6.3_560.35.05-1.x86_64.rpm
警告: cuda-repo-rhel9-12-6-local-12.6.3_560.35.05-1.x86_64.rpm: ヘッダー V4 RSA/SHA512 Signature、鍵 ID e1329fa8: NOKEY
/etc/yum.repos.d/cuda-rhel9-12-6-local.repo
/var/cuda-repo-rhel9-12-6-local
/var/cuda-repo-rhel9-12-6-local/E1329FA8.pub
/var/cuda-repo-rhel9-12-6-local/Local.md5
/var/cuda-repo-rhel9-12-6-local/Local.md5.gpg
/var/cuda-repo-rhel9-12-6-local/cuda-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-E1329FA8-keyring.gpg
/var/cuda-repo-rhel9-12-6-local/cuda-cccl-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-command-line-tools-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-compat-12-6-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-compiler-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-crt-12-6-12.6.85-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-cudart-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-cudart-devel-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-cuobjdump-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-cupti-12-6-12.6.80-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-cuxxfilt-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-demo-suite-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-documentation-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-driver-devel-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-drivers-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-drivers-fabricmanager-560-560.35.05-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-drivers-fabricmanager-560.35.05-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-gdb-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-gdb-src-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-libraries-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-libraries-devel-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-minimal-build-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nsight-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nsight-compute-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nsight-systems-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvcc-12-6-12.6.85-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvdisasm-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvml-devel-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvprof-12-6-12.6.80-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvprune-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvrtc-12-6-12.6.85-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvrtc-devel-12-6-12.6.85-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvtx-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvvm-12-6-12.6.85-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-nvvp-12-6-12.6.80-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-opencl-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-opencl-devel-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-profiler-api-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-runtime-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-sanitizer-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-toolkit-12-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-toolkit-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-toolkit-12-6-config-common-12.6.77-1.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-toolkit-12-config-common-12.6.77-1.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-toolkit-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-toolkit-config-common-12.6.77-1.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-tools-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/cuda-visual-tools-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/dnf-plugin-nvidia-2.2-1.el9.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/egl-gbm-1.1.2^20240919gitb24587d-3.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/egl-gbm-1.1.2^20240919gitb24587d-3.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/egl-wayland-1.1.13.1-3.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/egl-wayland-1.1.13.1-3.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/egl-wayland-devel-1.1.13.1-3.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/egl-wayland-devel-1.1.13.1-3.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/egl-x11-1.0.0^20240916gitf13be94-1.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/egl-x11-1.0.0^20240916gitf13be94-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/eglexternalplatform-devel-1.2-2.el9.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/gds-tools-12-6-1.11.1.6-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/kmod-nvidia-latest-dkms-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/kmod-nvidia-open-dkms-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcublas-12-6-12.6.4.1-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcublas-devel-12-6-12.6.4.1-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcufft-12-6-11.3.0.4-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcufft-devel-12-6-11.3.0.4-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcufile-12-6-1.11.1.6-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcufile-devel-12-6-1.11.1.6-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcurand-12-6-10.3.7.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcurand-devel-12-6-10.3.7.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcusolver-12-6-11.7.1.2-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcusolver-devel-12-6-11.7.1.2-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcusparse-12-6-12.5.4.2-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libcusparse-devel-12-6-12.5.4.2-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnpp-12-6-12.3.1.54-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnpp-devel-12-6-12.3.1.54-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvfatbin-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvfatbin-devel-12-6-12.6.77-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvidia-cfg-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvidia-fbc-560.35.05-1.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/libnvidia-fbc-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvidia-ml-560.35.05-1.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/libnvidia-ml-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvidia-nscq-560-560.35.05-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvjitlink-12-6-12.6.85-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvjitlink-devel-12-6-12.6.85-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvjpeg-12-6-12.3.3.54-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvjpeg-devel-12-6-12.3.3.54-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/libnvsdm-560-560.35.05-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/modules.yaml
/var/cuda-repo-rhel9-12-6-local/nsight-compute-2024.3.2-2024.3.2.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nsight-systems-2024.5.1-2024.5.1.113_3461954-0.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-driver-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-driver-cuda-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-driver-cuda-libs-560.35.05-1.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-driver-cuda-libs-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-driver-libs-560.35.05-1.el9.i686.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-driver-libs-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-fabric-manager-560.35.05-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-fabric-manager-devel-560.35.05-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-fs-2.22.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-fs-dkms-2.22.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-gds-12-6-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-gds-12.6.3-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-imex-560-560.35.05-1.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-kmod-common-560.35.05-1.el9.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-kmod-headers-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-kmod-source-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-libXNVCtrl-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-libXNVCtrl-devel-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-modprobe-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-open-560-560.35.05-1.el9.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-open-560.35.05-1.el9.noarch.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-persistenced-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-settings-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/nvidia-xconfig-560.35.05-1.el9.x86_64.rpm
/var/cuda-repo-rhel9-12-6-local/repodata
/var/cuda-repo-rhel9-12-6-local/repodata/0656117c5424d013deb8e0dd714ef05939d067dacadcdbd5a618607fe5ffdd7f-other.xml.gz
/var/cuda-repo-rhel9-12-6-local/repodata/0c56590cb379a815695ca9296d71715938452e663d10fcb46177b68a4f7fe0bf-filelists.sqlite.bz2
/var/cuda-repo-rhel9-12-6-local/repodata/14f181d05430dad1529d379514508e74ec4e82c69434182fbbc3e59e065515de-primary.sqlite.bz2
/var/cuda-repo-rhel9-12-6-local/repodata/20f02eae29ea7138d227d20255ceb14dbf33e09f777a9c13709d3d46e74f19e2-modules.yaml.gz
/var/cuda-repo-rhel9-12-6-local/repodata/4a8ba40fa79d901dadff80d1d3256059c0a41585e103649fbb3fa4e0c27a1287-primary.xml.gz
/var/cuda-repo-rhel9-12-6-local/repodata/99169b0de38a61d08cbc2f54a8e6a25ee24e64170da3460f66411bfbb18fb3a7-other.sqlite.bz2
/var/cuda-repo-rhel9-12-6-local/repodata/a45809465aef5d05b8821b1ff42dab2f618caff466b63f94ca029d1a5da6bd6a-filelists.xml.gz
/var/cuda-repo-rhel9-12-6-local/repodata/repomd.xml
/var/cuda-repo-rhel9-12-6-local/repodata/repomd.xml.asc
/var/cuda-repo-rhel9-12-6-local/repodata/repomd.xml.key
/var/cuda-repo-rhel9-12-6-local/tmp

@kaigai
Copy link
Contributor Author

kaigai commented Dec 25, 2024

ローカルパッケージを使っても、巻き込みでインストールされる最新版kernelとかハマりどころはあるので、あんまり「隠す」のは得策ではないと思われます。
CUDA Toolkitに対応するドライバを明記して、それを明示的に導入してもらうべきかな。。。

@ytooyama
Copy link
Contributor

RHEL + EUSを使っている限り、そこまで大きくハマる経験はしていないですね。リリース版しか使っていないとアップデートで引っかかることはありますけどね

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants