-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL][COMPAT] Add launch lambda overload #18021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
// Overload taking zero-argument function object | ||
template <class F, int Dim> | ||
std::enable_if_t<std::is_invocable_v<const F &>, sycl::event> | ||
launch(const F &f, const sycl::nd_range<Dim> &range, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure what order of argument is best:
- Advantages of lambda first (this): function name (as part of lambda) is first like in cuda and other overloads
- Advantage of lambda last (like in https://github.com/codeplaysoftware/cutlass-fork/pull/305/files): consistent with SYCL parallel_for and sycl_khr_free_function_commands. Function arguments (as part of lambda) are last like in cuda and other overloads.
} | ||
// Alternative launch through dim3 objects | ||
template <class F> | ||
std::enable_if_t<std::is_invocable_v<const F &>, sycl::event> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this is unchanged, but I think you will want an overload of these launch functions that do not return an event, and that use the new enqueue_functions API (nd_launch): see https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc
This is much closer to native cuda and does lead to a very noticeable performance improvement for small kernels (relevant to cutlass), because the overhead of creating events in cuda/hip is big.
I do not know whether having the function calling nd_launch in the implementation might change other design decisions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we want a version which doesn't return an event and calls nd_launch. I don't think internally calling nd_launch will change the design decision here.
An overload isn't a good way to add a version which doesn't return an event. Given that one can't overload by return type, it would need to be a property or something like it to opt-out of the event. But getting an event back should be opt-in nor opt-out.
It would be best if we could just remove the return value. But I assume we don't want to break API.
If so I think we need to add a new function like launch2
or launch_fast
or something similar which doesn't return an event.
Implements #17832