Skip to content

Commit 2735034

Browse files
committed
Initial commit
1 parent 2e719bc commit 2735034

File tree

150 files changed

+61977
-53
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

150 files changed

+61977
-53
lines changed

.cargo/config.toml

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[alias]
2+
xtask = "run -p xtask --bin xtask --"

.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
book
2+
/target
3+
Cargo.lock
4+
**/.vscode

Cargo.toml

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
[workspace]
2+
members = [
3+
"crates/*",
4+
"xtask"
5+
]
6+
7+
[profile.release]
8+
debug = 2
9+
lto = "fat"
10+
codegen-units = 1

README.md

+2-53
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,3 @@
1-
# NVVM IR Rustc codegen
1+
# Rust CUDA
22

3-
This crate provides a codegen backend for rustc that generates [NVVM IR](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html), a specialized subset of LLVM IR
4-
used to write high performance GPU code for Nvidia GPUs.
5-
6-
## FAQ
7-
8-
### Are kernels written in Rust faster/slower than CUDA C/C++ kernels?
9-
10-
In theory, they are the same because NVCC uses libnvvm internally. In practice, they could
11-
be slower or faster just like regular CPU code can be faster or slower based on how LLVM/NVVM optimizes it.
12-
Rust kernels are likely to perform faster because of the many compiler hints given to NVVM.
13-
14-
### What is NVVM IR/libnvvm?
15-
16-
For compiling gpu kernels, NVCC (Nvidia cuda compiler) separates your CPU (host) and GPU (device)
17-
code and compiles it separately. The host code is given to a regular C/C++ compiler to compile to
18-
object files. The device code is converted into NVVM IR, NVVM IR is a subset of LLVM IR (LLVM IR with restrictions).
19-
This IR is given to a library called libnvvm (nvvm64_40_0.dll).
20-
21-
Libnvvm takes in this IR and it first runs GPU specific optimizations on it. Then, it runs the regular
22-
LLVM optimizations on it. Finally, it converts it into a PTX (Parallel Thread eXecution), essentially GPU
23-
assembly. Finally, you take that PTX file and run it using the CUDA Driver API.
24-
25-
TLDR: libnvvm is a library that takes a subset of LLVM IR and converts it to runnable gpu kernels.
26-
27-
### If libnvvm takes a subset of LLVM IR, why not just use rustc_codegen_llvm?
28-
29-
While NVVM IR is a subset of LLVM IR, it is a pretty limited one. Many things are not supported
30-
and should not be generated, including things like atomics, comdats, many function attrs, unwinding,
31-
stack probes, etc. Therefore the existing codegen will almost always generate invalid NVVM IR.
32-
33-
Moreover, NVVM IR requires special handling of a lot of things. For example, you must mark
34-
kernel functions explicitly using named metadata:
35-
36-
```llvm
37-
!nvvm.annotations = !{!12}
38-
!12 = !{void ()* @simple_kernel, !"kernel", i32 i}
39-
```
40-
41-
And adding this to existing generated IR would be exceedingly difficult.
42-
43-
Finally, the most important reason we cannot do this is that NVVM IR uses LLVM IR 7 (at the time of writing).
44-
While rustc uses LLVM 12. This makes any bitcode generated by rustc_codegen_llvm utterly incompatible with nvvm.
45-
46-
### Why not just compile for `nvptx64-nvidia-cuda`?
47-
48-
This is certainly an option as crates like `accel` have shown, however, it has very serious drawbacks which make it not really suitable:
49-
50-
- Due to LLVM dylib limitations, it is not possible to build nvptx crates to ptx files on Windows.
51-
- NVVM is much more different than LLVM's PTX backend, it includes specialized optimizations that
52-
are required to make Rust match CUDA C/C++'s speed.
53-
- NVVM IR contains GPU-specific IR metadata as well as specialized high-performance math functions through libdevice
54-
that are more optimized than LLVM's native intrinsics (which aren't even supported in NVVM IR).
3+
TODO: the entire readme

crates/cuda_builder/Cargo.toml

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[package]
2+
name = "cuda_builder"
3+
version = "0.1.0"
4+
edition = "2021"
5+
authors = ["Riccardo D'Ambrosio <[email protected]>", "The rust-gpu Authors"]
6+
7+
[dependencies]
8+
rustc_codegen_nvvm = { path = "../rustc_codegen_nvvm" }
9+
nvvm = { path = "../nvvm", version = "0.1" }
10+
serde = { version = "1.0.130", features = ["derive"] }
11+
serde_json = "1.0.68"

0 commit comments

Comments
 (0)