You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Initial work on supporting some async memory transfers
Experiments with Rust Futures
Implemented derive for RustToCudaAsync
Implemented async kernel launch
Fixed RustToCudaAsync derive
LaunchPackage with non-mut Stream
Moved stream to be an explicit kernel argument
Updated ExchangeWrapperOn[Device|Host]Async::move_to_stream
Upgraded to fixed RustaCuda
Added scratch-space methods for uni-directional CudaExchangeItem
Added unsafe-aliasing API to SplitSlideOverCudaThreads[Const|Dynamic]Stride
Extended the CudaExchangeItem API with scratch and uMaybeUninit
Rename SplitSliceOverCudaThreads[Const|Dynamic]Strude::alias_[mut_]unchecked
Implemented #[cuda(crate)] and #[kernel(crate)] attributes
Added simple thread-block shared memory support
Fixed device utils doc tests
Convert cuda thread-block-shared memory address to generic
First steps towards better shared memory, including dynamic
Revert derive changes + R2C-based approach start
Some progress on shared slices
Backup of progress on compile-time PTX checking
Clean up the PTX JIT implementation
Add convenience functions for ThreadBlockShared arrays
Improve and fix CI
Remove broken ThreadBlockShared RustToCuda impl
Refactor kernel trait generation to push more safety constraints to the kernel definition
Fixed SomeCudaAlloc import
Added error handling to the compile-time PTX checking
Add PTX lint parsing, no actual support yet
Added lint checking support to monomorphised kernel impls
Improve kernel checking + added cubin dump lint
Fix kernel macro config parsing
Explicitly fitting Device[Const|Mut]Ref into device registers
Switched one std:: to core::
Remove register-sized CUDA kernel args check, unnecessary since rust-lang/rust#94703
Simplified the kernel parameter layout extraction from PTX
Fix up rebase issues
Install CUDA in all CI steps
Use CStr literals
Simplify and document the safety traits
Fix move_to_cuda bound
Fix clippy for 1.76
Cleaned up the rust-cuda device macros with better print
The implementation still uses String for dynamic formatting, which
currently pulls in loads of formatting and panic machinery.
While a custom String type that pre-allocated the exact format String
length can avoid some of that, the formatting machinery even for e.g.
usize is still large.
If `format_args!` is ever optimised for better inlining, the more
verbose and lower-level implementation could be reconsidered.
Switch to using more vprintf in embedded CUDA kernel
Make print example fully executable
Clean up the print example
ptr_from_ref is stable from 1.76
Exit on CUDA panic instead of abort to allow the host to handle the error
Backup of early progress for switching from kernel traits to functions
More work into kernel functions instead of traits
Eliminate almost all ArgsTrait usages
Some refactoring of the async kernel func type + wrap code
Early sketch of extracting type wrapping from macro into types and traits
Early work towards using trait for kernel type wrap, ptx jit workaround missing
Lift complete CPU kernel wrapper from proc macro into public functions
Add async launch helper
Further cleanup of the new kernel param API
Start cleaning up the public API
Allow passing ThreadBlockShared to kernels again
Remove unsound mutable lending to CUDA for now
Allow passing ThreadBlockSharedSlice to kernel for dynamic shared memory
Begin refactoring the public API with device feature
Refactoring to prepare for better module structure
Extract kernel module just for parameters
Add RustToCuda impls for &T, &mut T, &[T], and &mut [T] where T: RustToCuda
Large restructuring of the module layout for rust-cuda
Split rust-cuda-kernel off from rust-cuda-derive
Update codecov action to handle rust-cuda-kernel
Fix clippy lint
Far too much time spent getting rid of DeviceCopy
More refactoring and auditing kernel param bounds
First exploration towards a stricter async CUDA API
More experiments with async API
Further API experimentation
Further async API experimentation
Further async API design work
Add RustToCudaAsync impls for &T and &[T], but not &mut T or &mut [T]
Add back mostly unchanged exchange wrapper + buffer with RustToCudaAsync impls
Add back mostly unchanged anti-aliasing types with RustToCudaAsync impls
Progress on replacing ...Async with Async<...>
Seal more implementation details
Further small API improvements
Add AsyncProj helper API struct for async projections
Disable async derive in examples for now
Implement RustToCudaAsync derive impls
Further async API improvements to add drop behaviour
First sketch of the safety constraints of a new NoSafeAliasing trait
First steps towards reintroducing LendToCudaMut
Fix no-std Box import for LendRustToCuda derive
Re-add RustToCuda implementation for Final
Remove redundant RustToCudaAsyncProxy
More progress on less 'static bounds on kernel params
Further investigation of less 'static bounds
Remove 'static bounds from LendToCuda ref kernel params
Make CudaExchangeBuffer Sync
Make CudaExchangeBuffer Sync v2
Add AsyncProj proj_ref and proj_mut convenience methods
Add RustToCudaWithPortableBitCloneSemantics adapter
Fix invalid const fn bounds
Add Deref[Mut] to the adapters
Fix pointer type inference error
Try removing __rust_cuda_ffi_safe_assert module
Ensure async launch mutable borrow safety with barriers on use and stream move
Fix uniqueness guarantee for Stream using branded types
Try without ref proj
Try add extract ref
Fix doc link
clean up kernel signature check
Some cleanup before merging
Fix some clippy lints, add FIXMEs for others
Add docs for rust-cuda-derive
Small refactoring + added docs for rust-cuda-kernel
Bump MSRV to 1.77-nightly
Try trait-based kernel signature check
Try naming host kernel layout const
Try match against byte literal for faster comparison
Try with memcmp intrinsic
Try out experimental const-type-layout with compression
Try check
Try check again
0 commit comments