|
1 |
| -# NVVM IR Rustc codegen |
| 1 | +# Rust CUDA |
2 | 2 |
|
3 |
| -This crate provides a codegen backend for rustc that generates [NVVM IR](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html), a specialized subset of LLVM IR |
4 |
| -used to write high performance GPU code for Nvidia GPUs. |
5 |
| - |
6 |
| -## FAQ |
7 |
| - |
8 |
| -### Are kernels written in Rust faster/slower than CUDA C/C++ kernels? |
9 |
| - |
10 |
| -In theory, they are the same because NVCC uses libnvvm internally. In practice, they could |
11 |
| -be slower or faster just like regular CPU code can be faster or slower based on how LLVM/NVVM optimizes it. |
12 |
| -Rust kernels are likely to perform faster because of the many compiler hints given to NVVM. |
13 |
| - |
14 |
| -### What is NVVM IR/libnvvm? |
15 |
| - |
16 |
| -For compiling gpu kernels, NVCC (Nvidia cuda compiler) separates your CPU (host) and GPU (device) |
17 |
| -code and compiles it separately. The host code is given to a regular C/C++ compiler to compile to |
18 |
| -object files. The device code is converted into NVVM IR, NVVM IR is a subset of LLVM IR (LLVM IR with restrictions). |
19 |
| -This IR is given to a library called libnvvm (nvvm64_40_0.dll). |
20 |
| - |
21 |
| -Libnvvm takes in this IR and it first runs GPU specific optimizations on it. Then, it runs the regular |
22 |
| -LLVM optimizations on it. Finally, it converts it into a PTX (Parallel Thread eXecution), essentially GPU |
23 |
| -assembly. Finally, you take that PTX file and run it using the CUDA Driver API. |
24 |
| - |
25 |
| -TLDR: libnvvm is a library that takes a subset of LLVM IR and converts it to runnable gpu kernels. |
26 |
| - |
27 |
| -### If libnvvm takes a subset of LLVM IR, why not just use rustc_codegen_llvm? |
28 |
| - |
29 |
| -While NVVM IR is a subset of LLVM IR, it is a pretty limited one. Many things are not supported |
30 |
| -and should not be generated, including things like atomics, comdats, many function attrs, unwinding, |
31 |
| -stack probes, etc. Therefore the existing codegen will almost always generate invalid NVVM IR. |
32 |
| - |
33 |
| -Moreover, NVVM IR requires special handling of a lot of things. For example, you must mark |
34 |
| -kernel functions explicitly using named metadata: |
35 |
| - |
36 |
| -```llvm |
37 |
| -!nvvm.annotations = !{!12} |
38 |
| - !12 = !{void ()* @simple_kernel, !"kernel", i32 i} |
39 |
| -``` |
40 |
| - |
41 |
| -And adding this to existing generated IR would be exceedingly difficult. |
42 |
| - |
43 |
| -Finally, the most important reason we cannot do this is that NVVM IR uses LLVM IR 7 (at the time of writing). |
44 |
| -While rustc uses LLVM 12. This makes any bitcode generated by rustc_codegen_llvm utterly incompatible with nvvm. |
45 |
| - |
46 |
| -### Why not just compile for `nvptx64-nvidia-cuda`? |
47 |
| - |
48 |
| -This is certainly an option as crates like `accel` have shown, however, it has very serious drawbacks which make it not really suitable: |
49 |
| - |
50 |
| -- Due to LLVM dylib limitations, it is not possible to build nvptx crates to ptx files on Windows. |
51 |
| -- NVVM is much more different than LLVM's PTX backend, it includes specialized optimizations that |
52 |
| -are required to make Rust match CUDA C/C++'s speed. |
53 |
| -- NVVM IR contains GPU-specific IR metadata as well as specialized high-performance math functions through libdevice |
54 |
| -that are more optimized than LLVM's native intrinsics (which aren't even supported in NVVM IR). |
| 3 | +TODO: the entire readme |
0 commit comments