Querent-ai
diff --git a/‎.cargo/config.toml
+2 b/‎.cargo/config.toml
+2
diff --git a/‎.gitignore
+4 b/‎.gitignore
+4
diff --git a/‎Cargo.toml
+10 b/‎Cargo.toml
+10
diff --git a/‎README.md
+2-53 b/‎README.md
+2-53
diff --git a/‎crates/cuda_builder/Cargo.toml
+11 b/‎crates/cuda_builder/Cargo.toml
+11
@@ -0,0 +1,2 @@
+[alias]
+xtask = "run -p xtask --bin xtask --"
@@ -0,0 +1,4 @@
+book
+/target
+Cargo.lock
+**/.vscode
@@ -0,0 +1,10 @@
+[workspace]
+members = [
+  "crates/*",
+  "xtask"
+]
+
+[profile.release]
+debug = 2
+lto = "fat"
+codegen-units = 1
@@ -1,54 +1,3 @@
-# NVVM IR Rustc codegen
+# Rust CUDA
 
-This crate provides a codegen backend for rustc that generates [NVVM IR](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html), a specialized subset of LLVM IR
-used to write high performance GPU code for Nvidia GPUs.
-
-## FAQ
-
-### Are kernels written in Rust faster/slower than CUDA C/C++ kernels?
-
-In theory, they are the same because NVCC uses libnvvm internally. In practice, they could 
-be slower or faster just like regular CPU code can be faster or slower based on how LLVM/NVVM optimizes it.
-Rust kernels are likely to perform faster because of the many compiler hints given to NVVM.
-
-### What is NVVM IR/libnvvm?
-
-For compiling gpu kernels, NVCC (Nvidia cuda compiler) separates your CPU (host) and GPU (device)
-code and compiles it separately. The host code is given to a regular C/C++ compiler to compile to 
-object files. The device code is converted into NVVM IR, NVVM IR is a subset of LLVM IR (LLVM IR with restrictions).
-This IR is given to a library called libnvvm (nvvm64_40_0.dll).
-
-Libnvvm takes in this IR and it first runs GPU specific optimizations on it. Then, it runs the regular
-LLVM optimizations on it. Finally, it converts it into a PTX (Parallel Thread eXecution), essentially GPU
-assembly. Finally, you take that PTX file and run it using the CUDA Driver API.
-
-TLDR: libnvvm is a library that takes a subset of LLVM IR and converts it to runnable gpu kernels.
-
-### If libnvvm takes a subset of LLVM IR, why not just use rustc_codegen_llvm?
-
-While NVVM IR is a subset of LLVM IR, it is a pretty limited one. Many things are not supported 
-and should not be generated, including things like atomics, comdats, many function attrs, unwinding,
-stack probes, etc. Therefore the existing codegen will almost always generate invalid NVVM IR.
-
-Moreover, NVVM IR requires special handling of a lot of things. For example, you must mark 
-kernel functions explicitly using named metadata:
-
-```llvm
-!nvvm.annotations = !{!12}
-  !12 = !{void ()* @simple_kernel, !"kernel", i32 i}
-```
-
-And adding this to existing generated IR would be exceedingly difficult.
-
-Finally, the most important reason we cannot do this is that NVVM IR uses LLVM IR 7 (at the time of writing).
-While rustc uses LLVM 12. This makes any bitcode generated by rustc_codegen_llvm utterly incompatible with nvvm.
-
-### Why not just compile for `nvptx64-nvidia-cuda`?
-
-This is certainly an option as crates like `accel` have shown, however, it has very serious drawbacks which make it not really suitable:
-
-- Due to LLVM dylib limitations, it is not possible to build nvptx crates to ptx files on Windows.
-- NVVM is much more different than LLVM's PTX backend, it includes specialized optimizations that
-are required to make Rust match CUDA C/C++'s speed.
-- NVVM IR contains GPU-specific IR metadata as well as specialized high-performance math functions through libdevice
-that are more optimized than LLVM's native intrinsics (which aren't even supported in NVVM IR).
+TODO: the entire readme
@@ -0,0 +1,11 @@
+[package]
+name = "cuda_builder"
+version = "0.1.0"
+edition = "2021"
+authors = ["Riccardo D'Ambrosio <[email protected]>", "The rust-gpu Authors"]
+
+[dependencies]
+rustc_codegen_nvvm = { path = "../rustc_codegen_nvvm" }
+nvvm = { path = "../nvvm", version = "0.1" }
+serde = { version = "1.0.130", features = ["derive"] }
+serde_json = "1.0.68"
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+[alias]`
	`2`	`+xtask = "run -p xtask --bin xtask --"`