-
Notifications
You must be signed in to change notification settings - Fork 28
Input alignment #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl-develop
Are you sure you want to change the base?
Input alignment #323
Conversation
auto valid_is_unit = a == _1{} || b == _1{}; | ||
auto inner = a == _1{} ? b : a; | ||
auto valid_inner = inner % (16 / el_size) == 0; | ||
auto valid_outer = !is_batch || get<2>(stride) % (64 / el_size) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to test:
the Memory Width is greater than or equal to 64 bytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not check on BMG right now, but IIRC, it was working correctly there even if that was not the case. I did check on PVC.
Co-authored-by: Joe Todd <[email protected]>
Co-authored-by: Joe Todd <[email protected]>
@@ -82,6 +82,11 @@ using namespace cute; | |||
|
|||
/////////////////////////////////////////////////////////////////////////////////////////////////// | |||
|
|||
// The alignment requirement in bytes on inner dimmension that will work for both PVC and BMG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for a slightly cumbersome late suggestion - this didn't occur to me on my first review, but I wonder if we want to make this a dedicated example? Or, at least, apply it to an example other than 00_pvc_gemm
. Given this will be many developers first intro to cutlass-sycl, it would be nice not to hit them in the face with some potentially daunting alignment & padding calculations straight away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I am unsure if this is a big enough change for its own example. Also, alternatively we hit them with many inputs being unsupported. I am not sure. I would prefer to hear other opinions on this.
auto valid_is_unit = a == _1{} || b == _1{}; | ||
auto inner = a == _1{} ? b : a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto valid_is_unit = a == _1{} || b == _1{}; | |
auto inner = a == _1{} ? b : a; | |
auto valid_is_unit = a == 1 || b == 1; | |
auto inner = a == 1 ? b : a; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you prefer that changed? If a check can be made at compile time, why not do it?
auto b = get<1>(stride); | ||
auto valid_is_unit = a == _1{} || b == _1{}; | ||
auto inner = a == _1{} ? b : a; | ||
auto valid_inner = inner % (16 / el_size) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
16 is a bit of a magic number here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
bool n_valid = n > 0 && n % 4 == 0; | ||
bool k_valid = k > 0 && k % get<2>(TileShape{}) == 0; | ||
bool shape_implementable = (m_valid && n_valid && k_valid); | ||
bool n_valid = n > 0 && n % (4 / sizeof(ElementB)) == 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and 4 is a magic number here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
auto valid_is_unit = a == _1{} || b == _1{}; | ||
auto inner = a == _1{} ? b : a; | ||
auto valid_inner = inner % (16 / el_size) == 0; | ||
auto valid_outer = !is_batch || get<2>(stride) % (64 / el_size) == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 is a magic number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -225,17 +248,27 @@ struct ExampleRunner { | |||
auto problem_shape_MNKL = cute::append<4>(problem_size, 1); | |||
auto [M, N, K, L] = problem_shape_MNKL; | |||
|
|||
// Padded values | |||
int N_B = cute::ceil_div(N, AlignElemB) * AlignElemB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a cute::round_up
which I think does what you are doing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced
Add checks for alignment of inputs and show how to set them in an example.