Skip to content

RFC: test out new syntax for launch with type deduction #305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: sycl-develop
Choose a base branch
from

Conversation

rolandschulz
Copy link
Collaborator

Prototype to test out a solution for intel/llvm#17832 . Looking for review on the syntax. Is it OK to have to wrap the function in a lambda to gain the type deduction? Should the lambda be past as first or last argument?

@t4c1
Copy link
Collaborator

t4c1 commented Apr 14, 2025

Looks cleaner to me, but I would not define stuff in syclcompat namespace within cutlass

@rolandschulz
Copy link
Collaborator Author

I agree in general we shouldn't put things into syclcompat namespace.

For this PR I put it in syclcompat, because the purpose of this PR is to protype what we want to ask syclcompat to add (intel/llvm#17832).

Assuming it gets added to syclcompat and we switch our code to use it, we would require the latest version of syclcompat. That doesn't work because we want to support the latest released DPC++ version. Would it be reasonable to have in our headers the extra syclcompat code for older versions of DPC++ to avoid this dependency?

Comment on lines +42 to +60
namespace syclcompat {
template <class F, int Dim>
sycl::event launch(const sycl::nd_range<Dim> &range, sycl::queue q, const F& f) {
return q.parallel_for(detail::transform_nd_range<Dim>(range), [=](sycl::nd_item<Dim>) { f(); });
}
template <class F, int Dim>
sycl::event launch(const sycl::nd_range<Dim> &range, const F& f) {
return launch(range, get_default_queue(), f);
}
// Alternative launch through dim3 objects
template <class F>
sycl::event launch(const dim3 &grid, const dim3 &threads, sycl::queue q, const F& f) {
return launch(sycl::nd_range<3>{grid * threads, threads}, q, f);
}
template <class F>
sycl::event launch(const dim3 &grid, const dim3 &threads, const F& f) {
return launch(grid, threads, get_default_queue(), f);
}
}
Copy link
Collaborator

@mehdi-goli mehdi-goli Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put those as PR on sycl::compat repo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FMarno
Copy link
Collaborator

FMarno commented Apr 16, 2025

We can add the dim3 overloads into the sycl-cuda-compat PR (#276) under a different namespace while we wait for upstream changes to make it to the release. Should be no harm in that.

B, dB, sB, tB,
C, dC, sC, tC,
alpha, beta);
auto event = syclcompat::launch(dimGrid, dimBlock, [=]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you only want to record this event if you are doing fine-grained profiling?
It is possible you could see a notable performance improvement by only recording events when required: see https://github.com/intel/llvm/pull/18021/files#r2048537558

Copy link

@JackAKirk JackAKirk Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if an analogue to cudaEventRecord in sycl would be useful to you then you could request creating an extension for e.g. oneapi_event_record that takes a sycl event.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants