Version 0.6.2 RC1: Stream callback semantics change, bug fixes #476
eyalroz
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The most significant change in this version regards the way callbacks/host functions are supported. This change is motivated mostly as preparation for the upcoming introduction of CUDA graph support (not in this version), which will impose some stricter constraints on callbacks - precluding the hack we have been using so far.
So far, a callback was any object invokable with an
std::stream_t
parameter. From now on, we support two kinds of callback:cuda::stream_t::enqueue_t::host_function_call(Argument * user_data)
cuda::stream_t::enqueue_t::host_invokable(Invokable& invokable)
This lets us avoid the combination of heap allocation at enqueue and deallocation at launch - which works well enough for now, but will not be possible when the same callback needs to be invoked multiple times. Also, it was in contradiction of our presumption not to add layers of abstraction over what CUDA itself provides.
Of course, the release also has s the "usual" long list of minor fixes.
Changes to existing API
cuda::kernel::get()
now takes a device, not a kernel - since it can't really do anything useful for non-primary kernels (which is where apriori-compiled kernels are available)API additions
cuda::memory::region_t
's when enqueueing copy operations on streams (and thus alsocuda::span<T>
's)cuda::memory::copy_parameters_t<N>
(for N=2 or 3), a wrapper of the CUDA driver's richest parameters structure with multiple convenience functions, for maximum configurability of a copy operation. But - this structure is not currently "fool-proof", so use with care and initialize all relevant fields.cuda::pointer_t
Bug fixes
device::get()
no longer incorrectly marked asnoexcept
allocate_managed()~ in
context.hpp`flush_remote_writes()
operation on a stream (this is one of the "batch stream memory operations)apriori_compiled_kernel_t::get_attribute
should be marked inline #449apriori_compiled_kernel_t::get_attribute()
was missing aninline
decorationcuda::profiling::mark::range_start()
andrange_end()
were callingcreate_attributions()
the wrong wayCleanup and warning avoidance
Compatibility
Other changes
constexpr
This discussion was created from the release Version 0.6.2 RC1: Stream callback semantics change, bug fixes.
Beta Was this translation helpful? Give feedback.
All reactions