Skip to content

Commit 07f66fa

Browse files
committed
Initial work on supporting some async memory transfers
Experiments with Rust Futures Implemented derive for RustToCudaAsync Implemented async kernel launch Fixed RustToCudaAsync derive LaunchPackage with non-mut Stream Moved stream to be an explicit kernel argument Updated ExchangeWrapperOn[Device|Host]Async::move_to_stream Upgraded to fixed RustaCuda Added scratch-space methods for uni-directional CudaExchangeItem Added unsafe-aliasing API to SplitSlideOverCudaThreads[Const|Dynamic]Stride Extended the CudaExchangeItem API with scratch and uMaybeUninit Rename SplitSliceOverCudaThreads[Const|Dynamic]Strude::alias_[mut_]unchecked Implemented #[cuda(crate)] and #[kernel(crate)] attributes Added simple thread-block shared memory support Fixed device utils doc tests Convert cuda thread-block-shared memory address to generic First steps towards better shared memory, including dynamic Revert derive changes + R2C-based approach start Some progress on shared slices Backup of progress on compile-time PTX checking Clean up the PTX JIT implementation Add convenience functions for ThreadBlockShared arrays Improve and fix CI Remove broken ThreadBlockShared RustToCuda impl Refactor kernel trait generation to push more safety constraints to the kernel definition Fixed SomeCudaAlloc import Added error handling to the compile-time PTX checking Add PTX lint parsing, no actual support yet Added lint checking support to monomorphised kernel impls Improve kernel checking + added cubin dump lint Fix kernel macro config parsing Explicitly fitting Device[Const|Mut]Ref into device registers Switched one std:: to core:: Remove register-sized CUDA kernel args check, unnecessary since rust-lang/rust#94703 Simplified the kernel parameter layout extraction from PTX Fix up rebase issues Install CUDA in all CI steps Use CStr literals Simplify and document the safety traits Fix move_to_cuda bound Fix clippy for 1.76 Cleaned up the rust-cuda device macros with better print The implementation still uses String for dynamic formatting, which currently pulls in loads of formatting and panic machinery. While a custom String type that pre-allocated the exact format String length can avoid some of that, the formatting machinery even for e.g. usize is still large. If `format_args!` is ever optimised for better inlining, the more verbose and lower-level implementation could be reconsidered. Switch to using more vprintf in embedded CUDA kernel Make print example fully executable Clean up the print example ptr_from_ref is stable from 1.76 Exit on CUDA panic instead of abort to allow the host to handle the error Backup of early progress for switching from kernel traits to functions More work into kernel functions instead of traits Eliminate almost all ArgsTrait usages Some refactoring of the async kernel func type + wrap code Early sketch of extracting type wrapping from macro into types and traits Early work towards using trait for kernel type wrap, ptx jit workaround missing Lift complete CPU kernel wrapper from proc macro into public functions Add async launch helper Further cleanup of the new kernel param API Start cleaning up the public API Allow passing ThreadBlockShared to kernels again Remove unsound mutable lending to CUDA for now Allow passing ThreadBlockSharedSlice to kernel for dynamic shared memory Begin refactoring the public API with device feature Refactoring to prepare for better module structure Extract kernel module just for parameters Add RustToCuda impls for &T, &mut T, &[T], and &mut [T] where T: RustToCuda Large restructuring of the module layout for rust-cuda Split rust-cuda-kernel off from rust-cuda-derive Update codecov action to handle rust-cuda-kernel Fix clippy lint Far too much time spent getting rid of DeviceCopy More refactoring and auditing kernel param bounds First exploration towards a stricter async CUDA API More experiments with async API Further API experimentation Further async API experimentation Further async API design work Add RustToCudaAsync impls for &T and &[T], but not &mut T or &mut [T] Add back mostly unchanged exchange wrapper + buffer with RustToCudaAsync impls Add back mostly unchanged anti-aliasing types with RustToCudaAsync impls Progress on replacing ...Async with Async<...> Seal more implementation details Further small API improvements Add AsyncProj helper API struct for async projections Disable async derive in examples for now Implement RustToCudaAsync derive impls Further async API improvements to add drop behaviour First sketch of the safety constraints of a new NoSafeAliasing trait First steps towards reintroducing LendToCudaMut Fix no-std Box import for LendRustToCuda derive Re-add RustToCuda implementation for Final Remove redundant RustToCudaAsyncProxy More progress on less 'static bounds on kernel params Further investigation of less 'static bounds Remove 'static bounds from LendToCuda ref kernel params Make CudaExchangeBuffer Sync Make CudaExchangeBuffer Sync v2 Add AsyncProj proj_ref and proj_mut convenience methods Add RustToCudaWithPortableBitCloneSemantics adapter Fix invalid const fn bounds Add Deref[Mut] to the adapters Fix pointer type inference error Try removing __rust_cuda_ffi_safe_assert module Ensure async launch mutable borrow safety with barriers on use and stream move Fix uniqueness guarantee for Stream using branded types Try without ref proj Try add extract ref Fix doc link clean up kernel signature check Some cleanup before merging Fix some clippy lints, add FIXMEs for others Add docs for rust-cuda-derive Small refactoring + added docs for rust-cuda-kernel Bump MSRV to 1.77-nightly Try trait-based kernel signature check Try naming host kernel layout const Try match against byte literal for faster comparison Try with memcmp intrinsic Try out experimental const-type-layout with compression Try check Try check again
1 parent f395253 commit 07f66fa

File tree

134 files changed

+11367
-5131
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

134 files changed

+11367
-5131
lines changed

.github/workflows/ci.yml

+45-86
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,16 @@ jobs:
2323
rust: [nightly]
2424

2525
steps:
26+
- name: Install CUDA
27+
run: |
28+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
29+
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
30+
curl -L -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb
31+
sudo dpkg -i cuda-keyring_1.0-1_all.deb
32+
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
33+
sudo apt-get update -q
34+
sudo apt-get install cuda -y --no-install-recommends
35+
2636
- name: Checkout the Repository
2737
uses: actions/checkout@v2
2838

@@ -40,53 +50,26 @@ jobs:
4050
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
4151
rm llvm.sh
4252
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
53+
54+
- name: Install cargo-hack
55+
uses: taiki-e/install-action@cargo-hack
4356

44-
- name: Check without features on CPU
45-
run: |
46-
cargo check
47-
48-
- name: Check with alloc feature on CPU
49-
run: |
50-
cargo check \
51-
--features alloc
52-
53-
- name: Check with derive feature on CPU
54-
run: |
55-
cargo check \
56-
--features derive
57-
58-
- name: Check with host feature on CPU
59-
run: |
60-
cargo check \
61-
--features host
62-
63-
- name: Check with host,derive,alloc features on CPU
57+
- name: Check feature powerset on the CPU
6458
run: |
65-
cargo check \
66-
--features host,derive,alloc
59+
cargo hack check --feature-powerset --optional-deps \
60+
--skip device \
61+
--keep-going
6762
68-
- name: Check without features on CUDA
63+
- name: Check feature powerset on CUDA
6964
run: |
70-
cargo check \
65+
cargo hack check --feature-powerset --optional-deps \
66+
--skip host \
67+
--keep-going \
7168
--target nvptx64-nvidia-cuda
7269
73-
- name: Check with alloc feature on CUDA
74-
run: |
75-
cargo check \
76-
--target nvptx64-nvidia-cuda \
77-
--features alloc
78-
79-
- name: Check with derive feature on CUDA
80-
run: |
81-
cargo check \
82-
--target nvptx64-nvidia-cuda \
83-
--features derive
84-
8570
- name: Check all workspace targets
8671
run: |
87-
cargo check \
88-
--workspace \
89-
--all-targets
72+
cargo check --workspace --all-targets
9073
9174
test:
9275
name: Test Suite
@@ -154,6 +137,16 @@ jobs:
154137
rust: [nightly]
155138

156139
steps:
140+
- name: Install CUDA
141+
run: |
142+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
143+
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
144+
curl -L -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-keyring_1.0-1_all.deb
145+
sudo dpkg -i cuda-keyring_1.0-1_all.deb
146+
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
147+
sudo apt-get update -q
148+
sudo apt-get install cuda -y --no-install-recommends
149+
157150
- name: Checkout the Repository
158151
uses: actions/checkout@v2
159152

@@ -173,58 +166,24 @@ jobs:
173166
rm llvm.sh
174167
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
175168
176-
- name: Check the code style without features on CPU
177-
run: |
178-
cargo clippy \
179-
-- -D warnings
180-
181-
- name: Check the code style with alloc feature on CPU
182-
run: |
183-
cargo clippy \
184-
--features alloc \
185-
-- -D warnings
186-
187-
- name: Check the code style with derive feature on CPU
188-
run: |
189-
cargo clippy \
190-
--features derive \
191-
-- -D warnings
192-
193-
- name: Check the code style with host feature on CPU
194-
run: |
195-
cargo clippy \
196-
--features host \
197-
-- -D warnings
198-
199-
- name: Check the code style with host,derive,alloc features on CPU
200-
run: |
201-
cargo clippy \
202-
--features host,derive,alloc \
203-
-- -D warnings
204-
205-
- name: Check the code style without features on CUDA
206-
run: |
207-
cargo clippy \
208-
--target nvptx64-nvidia-cuda \
209-
-- -D warnings
169+
- name: Install cargo-hack
170+
uses: taiki-e/install-action@cargo-hack
210171

211-
- name: Check the code style with alloc feature on CUDA
172+
- name: Check feature powerset on the CPU
212173
run: |
213-
cargo clippy \
214-
--target nvptx64-nvidia-cuda \
215-
--features alloc \
174+
cargo hack clippy --feature-powerset --optional-deps \
175+
--skip device \
176+
--keep-going \
216177
-- -D warnings
217-
218-
- name: Check the code style with derive feature on CUDA
178+
179+
- name: Check feature powerset on CUDA
219180
run: |
220-
cargo clippy \
181+
cargo hack clippy --feature-powerset --optional-deps \
182+
--skip host \
183+
--keep-going \
221184
--target nvptx64-nvidia-cuda \
222-
--features derive \
223185
-- -D warnings
224186
225-
- name: Check the code style for all workspace targets
187+
- name: Check all workspace targets
226188
run: |
227-
cargo clippy \
228-
--workspace \
229-
--all-targets \
230-
-- -D warnings
189+
cargo clippy --workspace --all-targets -- -D warnings

.github/workflows/coverage.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ jobs:
5656
./grcov . -s . --binary-path ./target/debug/deps \
5757
-t lcov -o coverage.lcov --branch \
5858
--keep-only "src/*" \
59-
--keep-only "rust-cuda-ptx-jit/*" \
6059
--keep-only "rust-cuda-derive/*" \
60+
--keep-only "rust-cuda-kernel/*" \
6161
--ignore-not-existing \
6262
--excl-line GRCOV_EXCL_LINE \
6363
--excl-start GRCOV_EXCL_START \

.github/workflows/rustdoc.yml

+2
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ jobs:
2828
run: |
2929
RUSTDOCFLAGS="\
3030
--enable-index-page \
31+
--extern-html-root-url const_type_layout=https://docs.rs/const-type-layout/0.2.1/ \
32+
--extern-html-root-url final=https://docs.rs/final/0.1.1/ \
3133
--extern-html-root-url rustacuda=https://docs.rs/rustacuda/0.1.3/ \
3234
--extern-html-root-url rustacuda_core=https://docs.rs/rustacuda_core/0.1.2/ \
3335
--extern-html-root-url rustacuda_derive=https://docs.rs/rustacuda_derive/0.1.2/ \

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,6 @@ Cargo.lock
88

99
# These are backup files generated by rustfmt
1010
**/*.rs.bk
11+
12+
# cargo expand dev output files
13+
**/expanded.rs

.vscode/settings.json

+7-1
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,11 @@
44
"rust-analyzer.updates.askBeforeDownload": false,
55
"rust-analyzer.checkOnSave.command": "reap-clippy",
66
"rust-analyzer.cargo.allFeatures": false,
7-
"rust-analyzer.cargo.features": ["alloc", "derive", "host"],
7+
"rust-analyzer.cargo.features": [
8+
"derive",
9+
"final",
10+
"host",
11+
"kernel"
12+
],
13+
"rust-analyzer.showUnlinkedFileNotification": false,
814
}

Cargo.toml

+19-16
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
[workspace]
22
members = [
3-
".", "rust-cuda-derive", "rust-cuda-ptx-jit",
4-
"examples/single-source", "examples/derive",
3+
".", "rust-cuda-derive", "rust-cuda-kernel",
4+
"examples/derive", "examples/print", "examples/single-source",
55
]
66
default-members = [
7-
".", "rust-cuda-derive", "rust-cuda-ptx-jit"
7+
".", "rust-cuda-derive", "rust-cuda-kernel",
88
]
99

1010
[package]
@@ -19,23 +19,26 @@ rust-version = "1.79" # nightly
1919

2020
[features]
2121
default = []
22-
alloc = ["hashbrown"]
23-
host = ["rustacuda", "rust-cuda-ptx-jit/host"]
24-
derive = ["rustacuda_derive", "rust-cuda-derive"]
22+
derive = ["dep:rustacuda_derive", "dep:rust-cuda-derive"]
23+
device = []
24+
final = ["dep:final"]
25+
host = ["dep:rustacuda", "dep:regex", "dep:oneshot", "dep:safer_owning_ref"]
26+
kernel = ["dep:rust-cuda-kernel"]
2527

2628
[dependencies]
27-
rustacuda_core = "0.1.2"
29+
rustacuda_core = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc" }
2830

29-
rustacuda = { version = "0.1.3", optional = true }
30-
rustacuda_derive = { version = "0.1.2", optional = true }
31+
rustacuda = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc", optional = true }
32+
rustacuda_derive = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc", optional = true }
3133

32-
const-type-layout = { version = "0.3.0", features = ["derive"] }
34+
regex = { version = "1.10", optional = true }
3335

34-
final = "0.1.1"
35-
hashbrown = { version = "0.14", default-features = false, features = ["inline-more"], optional = true }
36+
const-type-layout = { git = "https://github.com/juntyr/const-type-layout", branch = "compress", features = ["derive"] }
3637

37-
rust-cuda-derive = { path = "rust-cuda-derive", optional = true }
38-
rust-cuda-ptx-jit = { path = "rust-cuda-ptx-jit" }
38+
safer_owning_ref = { version = "0.5", optional = true }
39+
oneshot = { version = "0.1", optional = true, features = ["std", "async"] }
40+
41+
final = { version = "0.1.1", optional = true }
3942

40-
[dev-dependencies]
41-
hashbrown = { version = "0.14", default-features = false, features = ["inline-more"] }
43+
rust-cuda-derive = { path = "rust-cuda-derive", optional = true }
44+
rust-cuda-kernel = { path = "rust-cuda-kernel", optional = true }

README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
# rust-cuda &emsp; [![CI Status]][workflow] [![Rust Doc]][docs] [![License Status]][fossa] [![Code Coverage]][codecov] [![Gitpod Ready-to-Code]][gitpod]
1+
# rust-cuda &emsp; [![CI Status]][workflow] [![MSRV]][repo] [![Rust Doc]][docs] [![License Status]][fossa] [![Code Coverage]][codecov] [![Gitpod Ready-to-Code]][gitpod]
22

33
[CI Status]: https://img.shields.io/github/actions/workflow/status/juntyr/rust-cuda/ci.yml?branch=main
44
[workflow]: https://github.com/juntyr/rust-cuda/actions/workflows/ci.yml?query=branch%3Amain
55

6+
[MSRV]: https://img.shields.io/badge/MSRV-1.77.0--nightly-orange
7+
[repo]: https://github.com/juntyr/rust-cuda
8+
69
[Rust Doc]: https://img.shields.io/badge/docs-main-blue
710
[docs]: https://juntyr.github.io/rust-cuda/
811

examples/derive/Cargo.toml

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
[package]
22
name = "derive"
33
version = "0.1.0"
4-
authors = ["Juniper Tyree <juniper.langenstein@helsinki.fi>"]
4+
authors = ["Juniper Tyree <juniper.tyree@helsinki.fi>"]
55
license = "MIT OR Apache-2.0"
66
edition = "2021"
77

88
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
99

1010
[dependencies]
11-
const-type-layout = { version = "0.3.0" }
12-
rust-cuda = { path = "../../", features = ["derive", "host"] }
11+
rc = { package = "rust-cuda", path = "../../", features = ["derive", "host"] }

examples/derive/src/lib.rs

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
#![deny(clippy::pedantic)]
22
#![feature(const_type_name)]
33

4-
#[derive(rust_cuda::common::LendRustToCuda)]
4+
#[derive(rc::lend::LendRustToCuda)]
5+
#[cuda(crate = "rc")]
56
struct Inner<T: Copy> {
67
#[cuda(embed)]
78
inner: T,
89
}
910

10-
#[derive(rust_cuda::common::LendRustToCuda)]
11+
#[derive(rc::lend::LendRustToCuda)]
12+
#[cuda(crate = "rc")]
1113
struct Outer<T: Copy> {
1214
#[cuda(embed)]
1315
inner: Inner<T>,

examples/print/.cargo/config.toml

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
[target.nvptx64-nvidia-cuda]
2+
rustflags = ["-Clink-args=--arch sm_35", "-Clinker-plugin-lto", "-Ccodegen-units=1", "-Clink-arg=-O3", "-Clink-arg=--lto"]

examples/print/Cargo.toml

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
[package]
2+
name = "print"
3+
version = "0.1.0"
4+
authors = ["Juniper Tyree <[email protected]>"]
5+
license = "MIT OR Apache-2.0"
6+
edition = "2021"
7+
8+
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
9+
10+
[target.'cfg(target_os = "cuda")'.dependencies]
11+
rust-cuda = { path = "../../", features = ["kernel", "device"] }
12+
13+
[target.'cfg(not(target_os = "cuda"))'.dependencies]
14+
rust-cuda = { path = "../../", features = ["kernel", "host"] }

0 commit comments

Comments
 (0)