Skip to content

[SYCL][COMPAT] Add launch lambda overload #18021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: sycl
Choose a base branch
from

Conversation

rolandschulz
Copy link
Contributor

Implements #17832

@rolandschulz rolandschulz requested a review from a team as a code owner April 15, 2025 00:51
// Overload taking zero-argument function object
template <class F, int Dim>
std::enable_if_t<std::is_invocable_v<const F &>, sycl::event>
launch(const F &f, const sycl::nd_range<Dim> &range,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure what order of argument is best:

  • Advantages of lambda first (this): function name (as part of lambda) is first like in cuda and other overloads
  • Advantage of lambda last (like in https://github.com/codeplaysoftware/cutlass-fork/pull/305/files): consistent with SYCL parallel_for and sycl_khr_free_function_commands. Function arguments (as part of lambda) are last like in cuda and other overloads.

}
// Alternative launch through dim3 objects
template <class F>
std::enable_if_t<std::is_invocable_v<const F &>, sycl::event>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this is unchanged, but I think you will want an overload of these launch functions that do not return an event, and that use the new enqueue_functions API (nd_launch): see https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc
This is much closer to native cuda and does lead to a very noticeable performance improvement for small kernels (relevant to cutlass), because the overhead of creating events in cuda/hip is big.

I do not know whether having the function calling nd_launch in the implementation might change other design decisions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we want a version which doesn't return an event and calls nd_launch. I don't think internally calling nd_launch will change the design decision here.

An overload isn't a good way to add a version which doesn't return an event. Given that one can't overload by return type, it would need to be a property or something like it to opt-out of the event. But getting an event back should be opt-in nor opt-out.

It would be best if we could just remove the return value. But I assume we don't want to break API.

If so I think we need to add a new function like launch2 or launch_fast or something similar which doesn't return an event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants