Skip to content

Commit eba6c37

Browse files
authored
API redesign with async support (#18)
* Initial work on supporting some async memory transfers Experiments with Rust Futures Implemented derive for RustToCudaAsync Implemented async kernel launch Fixed RustToCudaAsync derive LaunchPackage with non-mut Stream Moved stream to be an explicit kernel argument Updated ExchangeWrapperOn[Device|Host]Async::move_to_stream Upgraded to fixed RustaCuda Added scratch-space methods for uni-directional CudaExchangeItem Added unsafe-aliasing API to SplitSlideOverCudaThreads[Const|Dynamic]Stride Extended the CudaExchangeItem API with scratch and uMaybeUninit Rename SplitSliceOverCudaThreads[Const|Dynamic]Strude::alias_[mut_]unchecked Implemented #[cuda(crate)] and #[kernel(crate)] attributes Added simple thread-block shared memory support Fixed device utils doc tests Convert cuda thread-block-shared memory address to generic First steps towards better shared memory, including dynamic Revert derive changes + R2C-based approach start Some progress on shared slices Backup of progress on compile-time PTX checking Clean up the PTX JIT implementation Add convenience functions for ThreadBlockShared arrays Improve and fix CI Remove broken ThreadBlockShared RustToCuda impl Refactor kernel trait generation to push more safety constraints to the kernel definition Fixed SomeCudaAlloc import Added error handling to the compile-time PTX checking Add PTX lint parsing, no actual support yet Added lint checking support to monomorphised kernel impls Improve kernel checking + added cubin dump lint Fix kernel macro config parsing Explicitly fitting Device[Const|Mut]Ref into device registers Switched one std:: to core:: Remove register-sized CUDA kernel args check, unnecessary since rust-lang/rust#94703 Simplified the kernel parameter layout extraction from PTX Fix up rebase issues Install CUDA in all CI steps Use CStr literals Simplify and document the safety traits Fix move_to_cuda bound Fix clippy for 1.76 Cleaned up the rust-cuda device macros with better print The implementation still uses String for dynamic formatting, which currently pulls in loads of formatting and panic machinery. While a custom String type that pre-allocated the exact format String length can avoid some of that, the formatting machinery even for e.g. usize is still large. If `format_args!` is ever optimised for better inlining, the more verbose and lower-level implementation could be reconsidered. Switch to using more vprintf in embedded CUDA kernel Make print example fully executable Clean up the print example ptr_from_ref is stable from 1.76 Exit on CUDA panic instead of abort to allow the host to handle the error Backup of early progress for switching from kernel traits to functions More work into kernel functions instead of traits Eliminate almost all ArgsTrait usages Some refactoring of the async kernel func type + wrap code Early sketch of extracting type wrapping from macro into types and traits Early work towards using trait for kernel type wrap, ptx jit workaround missing Lift complete CPU kernel wrapper from proc macro into public functions Add async launch helper Further cleanup of the new kernel param API Start cleaning up the public API Allow passing ThreadBlockShared to kernels again Remove unsound mutable lending to CUDA for now Allow passing ThreadBlockSharedSlice to kernel for dynamic shared memory Begin refactoring the public API with device feature Refactoring to prepare for better module structure Extract kernel module just for parameters Add RustToCuda impls for &T, &mut T, &[T], and &mut [T] where T: RustToCuda Large restructuring of the module layout for rust-cuda Split rust-cuda-kernel off from rust-cuda-derive Update codecov action to handle rust-cuda-kernel Fix clippy lint Far too much time spent getting rid of DeviceCopy More refactoring and auditing kernel param bounds First exploration towards a stricter async CUDA API More experiments with async API Further API experimentation Further async API experimentation Further async API design work Add RustToCudaAsync impls for &T and &[T], but not &mut T or &mut [T] Add back mostly unchanged exchange wrapper + buffer with RustToCudaAsync impls Add back mostly unchanged anti-aliasing types with RustToCudaAsync impls Progress on replacing ...Async with Async<...> Seal more implementation details Further small API improvements Add AsyncProj helper API struct for async projections Disable async derive in examples for now Implement RustToCudaAsync derive impls Further async API improvements to add drop behaviour First sketch of the safety constraints of a new NoSafeAliasing trait First steps towards reintroducing LendToCudaMut Fix no-std Box import for LendRustToCuda derive Re-add RustToCuda implementation for Final Remove redundant RustToCudaAsyncProxy More progress on less 'static bounds on kernel params Further investigation of less 'static bounds Remove 'static bounds from LendToCuda ref kernel params Make CudaExchangeBuffer Sync Make CudaExchangeBuffer Sync v2 Add AsyncProj proj_ref and proj_mut convenience methods Add RustToCudaWithPortableBitCloneSemantics adapter Fix invalid const fn bounds Add Deref[Mut] to the adapters Fix pointer type inference error Try removing __rust_cuda_ffi_safe_assert module Ensure async launch mutable borrow safety with barriers on use and stream move Fix uniqueness guarantee for Stream using branded types Try without ref proj Try add extract ref Fix doc link clean up kernel signature check Some cleanup before merging Fix some clippy lints, add FIXMEs for others Add docs for rust-cuda-derive Small refactoring + added docs for rust-cuda-kernel Bump MSRV to 1.77-nightly Try trait-based kernel signature check Try naming host kernel layout const Try match against byte literal for faster comparison Try with memcmp intrinsic Try out experimental const-type-layout with compression Try check Try check again * Fix CUDA install in CI * Switch from kernel type signature check to random hash * Fix CI-identified failures * Use pinned nightly in CI * Try splitting the kernel func signature type check * Try with llvm-bitcode-linker * Upgrade to latest ptx-builder * Fix codecov by excluding ptx tests (codecov weirdly overrides linker)
1 parent f395253 commit eba6c37

File tree

138 files changed

+11459
-5221
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

138 files changed

+11459
-5221
lines changed

.github/workflows/ci.yml

Lines changed: 41 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,13 @@ jobs:
2323
rust: [nightly]
2424

2525
steps:
26+
- name: Install CUDA
27+
uses: Jimver/[email protected]
28+
with:
29+
method: network
30+
use-github-cache: false
31+
use-local-cache: false
32+
2633
- name: Checkout the Repository
2734
uses: actions/checkout@v2
2835

@@ -32,61 +39,27 @@ jobs:
3239
toolchain: ${{ matrix.rust }}
3340
profile: minimal
3441
target: nvptx64-nvidia-cuda
35-
override: true
36-
37-
- name: Install the rust-ptx-linker
38-
run: |
39-
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
40-
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
41-
rm llvm.sh
42-
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
43-
44-
- name: Check without features on CPU
45-
run: |
46-
cargo check
47-
48-
- name: Check with alloc feature on CPU
49-
run: |
50-
cargo check \
51-
--features alloc
52-
53-
- name: Check with derive feature on CPU
54-
run: |
55-
cargo check \
56-
--features derive
42+
override: false # FIXME
5743

58-
- name: Check with host feature on CPU
59-
run: |
60-
cargo check \
61-
--features host
44+
- name: Install cargo-hack
45+
uses: taiki-e/install-action@cargo-hack
6246

63-
- name: Check with host,derive,alloc features on CPU
47+
- name: Check feature powerset on the CPU
6448
run: |
65-
cargo check \
66-
--features host,derive,alloc
49+
cargo hack check --feature-powerset --optional-deps \
50+
--skip device \
51+
--keep-going
6752
68-
- name: Check without features on CUDA
53+
- name: Check feature powerset on CUDA
6954
run: |
70-
cargo check \
55+
cargo hack check --feature-powerset --optional-deps \
56+
--skip host \
57+
--keep-going \
7158
--target nvptx64-nvidia-cuda
7259
73-
- name: Check with alloc feature on CUDA
74-
run: |
75-
cargo check \
76-
--target nvptx64-nvidia-cuda \
77-
--features alloc
78-
79-
- name: Check with derive feature on CUDA
80-
run: |
81-
cargo check \
82-
--target nvptx64-nvidia-cuda \
83-
--features derive
84-
8560
- name: Check all workspace targets
8661
run: |
87-
cargo check \
88-
--workspace \
89-
--all-targets
62+
cargo check --workspace --all-targets
9063
9164
test:
9265
name: Test Suite
@@ -113,14 +86,7 @@ jobs:
11386
toolchain: ${{ matrix.rust }}
11487
profile: minimal
11588
target: nvptx64-nvidia-cuda
116-
override: true
117-
118-
- name: Install the rust-ptx-linker
119-
run: |
120-
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
121-
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
122-
rm llvm.sh
123-
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
89+
override: false # FIXME
12490

12591
- name: Run the test-suite
12692
run: |
@@ -154,6 +120,13 @@ jobs:
154120
rust: [nightly]
155121

156122
steps:
123+
- name: Install CUDA
124+
uses: Jimver/[email protected]
125+
with:
126+
method: network
127+
use-github-cache: false
128+
use-local-cache: false
129+
157130
- name: Checkout the Repository
158131
uses: actions/checkout@v2
159132

@@ -164,67 +137,26 @@ jobs:
164137
profile: minimal
165138
components: clippy
166139
target: nvptx64-nvidia-cuda
167-
override: true
168-
169-
- name: Install the rust-ptx-linker
170-
run: |
171-
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
172-
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
173-
rm llvm.sh
174-
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
175-
176-
- name: Check the code style without features on CPU
177-
run: |
178-
cargo clippy \
179-
-- -D warnings
180-
181-
- name: Check the code style with alloc feature on CPU
182-
run: |
183-
cargo clippy \
184-
--features alloc \
185-
-- -D warnings
186-
187-
- name: Check the code style with derive feature on CPU
188-
run: |
189-
cargo clippy \
190-
--features derive \
191-
-- -D warnings
140+
override: false # FIXME
192141

193-
- name: Check the code style with host feature on CPU
194-
run: |
195-
cargo clippy \
196-
--features host \
197-
-- -D warnings
198-
199-
- name: Check the code style with host,derive,alloc features on CPU
200-
run: |
201-
cargo clippy \
202-
--features host,derive,alloc \
203-
-- -D warnings
204-
205-
- name: Check the code style without features on CUDA
206-
run: |
207-
cargo clippy \
208-
--target nvptx64-nvidia-cuda \
209-
-- -D warnings
142+
- name: Install cargo-hack
143+
uses: taiki-e/install-action@cargo-hack
210144

211-
- name: Check the code style with alloc feature on CUDA
145+
- name: Check feature powerset on the CPU
212146
run: |
213-
cargo clippy \
214-
--target nvptx64-nvidia-cuda \
215-
--features alloc \
147+
cargo hack clippy --feature-powerset --optional-deps \
148+
--skip device \
149+
--keep-going \
216150
-- -D warnings
217-
218-
- name: Check the code style with derive feature on CUDA
151+
152+
- name: Check feature powerset on CUDA
219153
run: |
220-
cargo clippy \
154+
cargo hack clippy --feature-powerset --optional-deps \
155+
--skip host \
156+
--keep-going \
221157
--target nvptx64-nvidia-cuda \
222-
--features derive \
223158
-- -D warnings
224159
225-
- name: Check the code style for all workspace targets
160+
- name: Check all workspace targets
226161
run: |
227-
cargo clippy \
228-
--workspace \
229-
--all-targets \
230-
-- -D warnings
162+
cargo clippy --workspace --all-targets -- -D warnings

.github/workflows/coverage.yml

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -27,19 +27,17 @@ jobs:
2727
profile: minimal
2828
components: llvm-tools-preview
2929
target: nvptx64-nvidia-cuda
30-
override: true
31-
32-
- name: Install the rust-ptx-linker
33-
run: |
34-
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh
35-
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+")
36-
rm llvm.sh
37-
cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force
30+
override: false # FIXME
3831

3932
- name: Generate the coverage data
4033
run: |
4134
cargo clean
42-
cargo test --workspace --all-targets
35+
cargo test \
36+
--workspace \
37+
--all-targets \
38+
--exclude derive \
39+
--exclude print \
40+
--exclude single-source
4341
env:
4442
CARGO_INCREMENTAL: 0
4543
RUSTFLAGS: -Cinstrument-coverage
@@ -56,8 +54,8 @@ jobs:
5654
./grcov . -s . --binary-path ./target/debug/deps \
5755
-t lcov -o coverage.lcov --branch \
5856
--keep-only "src/*" \
59-
--keep-only "rust-cuda-ptx-jit/*" \
6057
--keep-only "rust-cuda-derive/*" \
58+
--keep-only "rust-cuda-kernel/*" \
6159
--ignore-not-existing \
6260
--excl-line GRCOV_EXCL_LINE \
6361
--excl-start GRCOV_EXCL_START \

.github/workflows/rustdoc.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,14 @@ jobs:
2222
with:
2323
toolchain: nightly
2424
profile: minimal
25-
override: true
25+
override: false # FIXME
2626

2727
- name: Build the Documentation
2828
run: |
2929
RUSTDOCFLAGS="\
3030
--enable-index-page \
31+
--extern-html-root-url const_type_layout=https://docs.rs/const-type-layout/0.3.1/ \
32+
--extern-html-root-url final=https://docs.rs/final/0.1.1/ \
3133
--extern-html-root-url rustacuda=https://docs.rs/rustacuda/0.1.3/ \
3234
--extern-html-root-url rustacuda_core=https://docs.rs/rustacuda_core/0.1.2/ \
3335
--extern-html-root-url rustacuda_derive=https://docs.rs/rustacuda_derive/0.1.2/ \

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,6 @@ Cargo.lock
88

99
# These are backup files generated by rustfmt
1010
**/*.rs.bk
11+
12+
# cargo expand dev output files
13+
**/expanded.rs

.gitpod.Dockerfile

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,16 @@ RUN echo "debconf debconf/frontend select Noninteractive" | sudo debconf-set-sel
88
echo "keyboard-configuration keyboard-configuration/layout select 'English (US)'" | sudo debconf-set-selections && \
99
echo "keyboard-configuration keyboard-configuration/layoutcode select 'us'" | sudo debconf-set-selections && \
1010
echo "resolvconf resolvconf/linkify-resolvconf boolean false" | sudo debconf-set-selections && \
11-
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin && \
12-
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
13-
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub && \
14-
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /" && \
11+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb -O cuda_keyring.deb && \
12+
sudo dpkg -i cuda_keyring.deb && \
13+
rm cuda_keyring.deb && \
14+
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
15+
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
16+
sudo add-apt-repository deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ / && \
1517
sudo apt-get update -q && \
16-
sudo apt-get install cuda -y --no-install-recommends && \
17-
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh && \
18-
sudo ./llvm.sh $(rustc --version -v | grep -oP "LLVM version: \K\d+") && \
19-
rm llvm.sh && \
18+
sudo apt-get install cuda-12-3 -y --no-install-recommends && \
2019
sudo apt-get clean autoclean && \
2120
sudo apt-get autoremove -y && \
2221
sudo rm -rf /var/lib/{apt,dpkg,cache,log}/
2322

24-
RUN cargo install rust-ptx-linker --git https://github.com/juntyr/rust-ptx-linker --force && \
25-
cargo install cargo-reaper --git https://github.com/juntyr/grim-reaper --force
23+
RUN cargo install cargo-reaper --git https://github.com/juntyr/grim-reaper --force

.vscode/settings.json

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,11 @@
44
"rust-analyzer.updates.askBeforeDownload": false,
55
"rust-analyzer.checkOnSave.command": "reap-clippy",
66
"rust-analyzer.cargo.allFeatures": false,
7-
"rust-analyzer.cargo.features": ["alloc", "derive", "host"],
7+
"rust-analyzer.cargo.features": [
8+
"derive",
9+
"final",
10+
"host",
11+
"kernel"
12+
],
13+
"rust-analyzer.showUnlinkedFileNotification": false,
814
}

Cargo.toml

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
[workspace]
22
members = [
3-
".", "rust-cuda-derive", "rust-cuda-ptx-jit",
4-
"examples/single-source", "examples/derive",
3+
".", "rust-cuda-derive", "rust-cuda-kernel",
4+
"examples/derive", "examples/print", "examples/single-source",
55
]
66
default-members = [
7-
".", "rust-cuda-derive", "rust-cuda-ptx-jit"
7+
".", "rust-cuda-derive", "rust-cuda-kernel",
88
]
99

1010
[package]
@@ -19,23 +19,26 @@ rust-version = "1.79" # nightly
1919

2020
[features]
2121
default = []
22-
alloc = ["hashbrown"]
23-
host = ["rustacuda", "rust-cuda-ptx-jit/host"]
24-
derive = ["rustacuda_derive", "rust-cuda-derive"]
22+
derive = ["dep:rustacuda_derive", "dep:rust-cuda-derive"]
23+
device = []
24+
final = ["dep:final"]
25+
host = ["dep:rustacuda", "dep:regex", "dep:oneshot", "dep:safer_owning_ref"]
26+
kernel = ["dep:rust-cuda-kernel"]
2527

2628
[dependencies]
27-
rustacuda_core = "0.1.2"
29+
rustacuda_core = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc" }
2830

29-
rustacuda = { version = "0.1.3", optional = true }
30-
rustacuda_derive = { version = "0.1.2", optional = true }
31+
rustacuda = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc", optional = true }
32+
rustacuda_derive = { git = "https://github.com/juntyr/RustaCUDA", rev = "c6ea7cc", optional = true }
3133

32-
const-type-layout = { version = "0.3.0", features = ["derive"] }
34+
regex = { version = "1.10", optional = true }
3335

34-
final = "0.1.1"
35-
hashbrown = { version = "0.14", default-features = false, features = ["inline-more"], optional = true }
36+
const-type-layout = { version = "0.3.1", features = ["derive"] }
3637

37-
rust-cuda-derive = { path = "rust-cuda-derive", optional = true }
38-
rust-cuda-ptx-jit = { path = "rust-cuda-ptx-jit" }
38+
safer_owning_ref = { version = "0.5", optional = true }
39+
oneshot = { version = "0.1", optional = true, features = ["std", "async"] }
40+
41+
final = { version = "0.1.1", optional = true }
3942

40-
[dev-dependencies]
41-
hashbrown = { version = "0.14", default-features = false, features = ["inline-more"] }
43+
rust-cuda-derive = { path = "rust-cuda-derive", optional = true }
44+
rust-cuda-kernel = { path = "rust-cuda-kernel", optional = true }

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
# rust-cuda &emsp; [![CI Status]][workflow] [![Rust Doc]][docs] [![License Status]][fossa] [![Code Coverage]][codecov] [![Gitpod Ready-to-Code]][gitpod]
1+
# rust-cuda &emsp; [![CI Status]][workflow] [![MSRV]][repo] [![Rust Doc]][docs] [![License Status]][fossa] [![Code Coverage]][codecov] [![Gitpod Ready-to-Code]][gitpod]
22

33
[CI Status]: https://img.shields.io/github/actions/workflow/status/juntyr/rust-cuda/ci.yml?branch=main
44
[workflow]: https://github.com/juntyr/rust-cuda/actions/workflows/ci.yml?query=branch%3Amain
55

6+
[MSRV]: https://img.shields.io/badge/MSRV-1.79.0--nightly-orange
7+
[repo]: https://github.com/juntyr/rust-cuda
8+
69
[Rust Doc]: https://img.shields.io/badge/docs-main-blue
710
[docs]: https://juntyr.github.io/rust-cuda/
811

examples/derive/Cargo.toml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
[package]
22
name = "derive"
33
version = "0.1.0"
4-
authors = ["Juniper Tyree <juniper.langenstein@helsinki.fi>"]
4+
authors = ["Juniper Tyree <juniper.tyree@helsinki.fi>"]
55
license = "MIT OR Apache-2.0"
66
edition = "2021"
77

88
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
99

1010
[dependencies]
11-
const-type-layout = { version = "0.3.0" }
12-
rust-cuda = { path = "../../", features = ["derive", "host"] }
11+
rc = { package = "rust-cuda", path = "../../", features = ["derive", "host"] }

examples/derive/src/lib.rs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
#![deny(clippy::pedantic)]
22
#![feature(const_type_name)]
33

4-
#[derive(rust_cuda::common::LendRustToCuda)]
4+
#[derive(rc::lend::LendRustToCuda)]
5+
#[cuda(crate = "rc")]
56
struct Inner<T: Copy> {
67
#[cuda(embed)]
78
inner: T,
89
}
910

10-
#[derive(rust_cuda::common::LendRustToCuda)]
11+
#[derive(rc::lend::LendRustToCuda)]
12+
#[cuda(crate = "rc")]
1113
struct Outer<T: Copy> {
1214
#[cuda(embed)]
1315
inner: Inner<T>,

0 commit comments

Comments
 (0)