Skip to content

Enable SM90 via sycl-cuda-compat #276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: sycl-develop
Choose a base branch
from

Conversation

FMarno
Copy link
Collaborator

@FMarno FMarno commented Mar 24, 2025

Builds on #266 to enable a number of examples running on nvidia hopper with minimal changes to the code. This is achieved through the use of the sycl-cuda-compat flag.
This is a big step towards easy compatibility with the upstream CUTLASS.
Shoutout to @Naghasan for developing the sycl-cuda-compat feature for dpcpp.

@FMarno FMarno force-pushed the finlay/enable_sm90_sycl-cuda-compat branch 3 times, most recently from 2368a71 to a2254d3 Compare April 1, 2025 07:11
@FMarno FMarno force-pushed the finlay/enable_sm90_sycl-cuda-compat branch from 003b9d9 to c327f8e Compare April 1, 2025 08:51
@FMarno FMarno marked this pull request as ready for review April 1, 2025 08:51
Comment on lines +147 to +150
decltype(A),
decltype(B),
decltype(C),
decltype(D),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be D, A, B, C

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's quite ugly and error prone that type deduction doesn't work here.
I'm wondering whether it is worth to add a macro so that we can pass a type representing function and make type deduction for the arguments work.
Could look like: https://godbolt.org/z/8cxcM15ba
This can be done quite a bit nicer with c++20: https://godbolt.org/z/7fK1M5he3. But I don't know whether it is reasonable to require C++20 for CUTLASS_ENABLE_SYCL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function parameter and the function template parameters are not in the same order, so the order A, B, C, D is correct. I agree though, this is a sharp edge of syclcompat.
I think this would have to be handled at the syclcompat level, since api of syclcompat::experimental::launch expects a function as a parameter.

template <
  class ATensor,
  class BTensor,
  class CTensor,
  class DTensor,
  class ElementAccumulator,
  class ElementEpilogue>
CUTLASS_GLOBAL
void
gett_kernel(
  DTensor       D,
  ATensor const A,
  BTensor const B,
  CTensor const C,
  ElementEpilogue alpha, ElementEpilogue beta,
  ElementAccumulator acc_init)
{

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I overlooked that the template arguments are in a different order.

I agree that if we want to use syclcompat for launch we would need to ask them to add this.

Do we gain anything from using syclcompat::launch here rather than directly calling default_queue().submit(...)? Only thing I see syclcompat::launch does is call transform_nd_range. But for 3d it doesn't do anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the only benefit is handling the conversion of sycl_grid and sycl_block into sycl::nd_range. The code you're suggestions would look like this:

  const syclcompat::dim3 sycl_grid(dimGrid.x, dimGrid.y, dimGrid.z);
  const syclcompat::dim3 sycl_block(dimBlock.x, dimBlock.y, dimBlock.z);
  syclcompat::get_default_queue().parallel_for(sycl::nd_range<3>{sycl_grid * sycl_block, sycl_block},
      [=](sycl::nd_item<3>) {
        [[clang::always_inline]] gett_kernel(D, A, B, C, alpha, beta, ElementAccumulator(0));
      });

For cases that use syclcompat::experimental::launch, it also handles properties like SLM size, cluster parameters, subgroup size etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should try to solve it in syclcompat (if at all). Do you want to suggest it or do you want me to open an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open an issue on the intel/llvm repo and bring it up internally within Codeplay. I'll also send you a link to the issue if you want to pass it on to anyone.

Copy link
Collaborator Author

@FMarno FMarno Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created that here #17832

Copy link
Collaborator Author

@FMarno FMarno Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolandschulz After a bit of Codeplay internal discussion, we're not sure it can be solved in C++17 without modifying the syclcompat api to accept a functor class as the template argument, which could be considered veering from the point of syclcompat.
Is it ok if I resolve this for now so the code can be merged?

Copy link
Collaborator

@aacostadiaz aacostadiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!!!

Comment on lines +245 to +246
syclcompat::dim3 sycl_grid(grid.x, grid.y, grid.z);
syclcompat::dim3 sycl_block(block.x, block.y, block.z);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of these two lines? Couldn't you use dimBlock/dimGrid directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no conversion between dim3 and syclcompat::dim3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants