forked from NVIDIA/cutlass
-
Notifications
You must be signed in to change notification settings - Fork 28
Input alignment #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
t4c1
wants to merge
10
commits into
codeplaysoftware:sycl-develop
Choose a base branch
from
t4c1:input_alignment
base: sycl-develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Input alignment #323
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
0392b93
add gemm shapes
aacostadiaz ba5bec4
hacky padding
t4c1 5398c87
cleanup alignment and checks
t4c1 2cd3901
revert copy_debug[H
t4c1 9663d2c
revert input.in
t4c1 663be9d
Merge remote-tracking branch 'origin/sycl-develop' into input_alignment
t4c1 f28f933
Update examples/sycl/00_pvc_gemm/00_pvc_gemm.cpp
t4c1 821d949
Update examples/sycl/00_pvc_gemm/00_pvc_gemm.cpp
t4c1 be6b493
Merge branch 'sycl-develop' into input_alignment
t4c1 57e5614
addressed review comments
t4c1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -162,11 +162,33 @@ class GemmUniversal< | |||||||||
auto m = get<0>(args.problem_shape); | ||||||||||
auto n = get<1>(args.problem_shape); | ||||||||||
auto k = get<2>(args.problem_shape); | ||||||||||
auto l = get<3>(args.problem_shape); | ||||||||||
bool is_batch = l > 1; | ||||||||||
// all these requirements are in bytes | ||||||||||
constexpr int inner_alignment_requirement = 16; | ||||||||||
constexpr int outer_alignment_requirement = 64; | ||||||||||
constexpr int width_alignment_requirement = 4; | ||||||||||
|
||||||||||
auto check_stride = [is_batch](auto stride, int el_size){ | ||||||||||
auto a = get<0>(stride); | ||||||||||
auto b = get<1>(stride); | ||||||||||
auto valid_is_unit = a == _1{} || b == _1{}; | ||||||||||
auto inner = a == _1{} ? b : a; | ||||||||||
Comment on lines
+175
to
+176
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would you prefer that changed? If a check can be made at compile time, why not do it? |
||||||||||
auto valid_inner = inner % (inner_alignment_requirement / el_size) == 0; | ||||||||||
auto valid_outer = !is_batch || get<2>(stride) % (outer_alignment_requirement / el_size) == 0; | ||||||||||
return valid_is_unit && valid_inner && valid_outer; | ||||||||||
}; | ||||||||||
bool strides_valid = check_stride(args.mainloop.dA, sizeof(ElementA)) && | ||||||||||
check_stride(args.mainloop.dB, sizeof(ElementB)) && | ||||||||||
check_stride(args.epilogue.dC, sizeof(ElementC)) && | ||||||||||
check_stride(args.epilogue.dD, sizeof(ElementD)); | ||||||||||
// TODO(codeplay): base *_valid on the atom shapes | ||||||||||
bool m_valid = m > 0; | ||||||||||
bool n_valid = n > 0 && n % 4 == 0; | ||||||||||
bool k_valid = k > 0 && k % get<2>(TileShape{}) == 0; | ||||||||||
bool shape_implementable = (m_valid && n_valid && k_valid); | ||||||||||
bool n_valid = n > 0 && n % (width_alignment_requirement / sizeof(ElementB)) == 0 && | ||||||||||
n % (width_alignment_requirement / sizeof(ElementC)) == 0 && | ||||||||||
n % (width_alignment_requirement / sizeof(ElementD)) == 0; | ||||||||||
bool k_valid = k > 0 && k % (width_alignment_requirement / sizeof(ElementA)) == 0; | ||||||||||
bool shape_implementable = m_valid && n_valid && k_valid && strides_valid; | ||||||||||
|
||||||||||
bool mode_implementable = args.mode == GemmUniversalMode::kGemm || | ||||||||||
(args.mode == GemmUniversalMode::kBatched && rank(ProblemShape{}) == 4); | ||||||||||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for a slightly cumbersome late suggestion - this didn't occur to me on my first review, but I wonder if we want to make this a dedicated example? Or, at least, apply it to an example other than
00_pvc_gemm
. Given this will be many developers first intro to cutlass-sycl, it would be nice not to hit them in the face with some potentially daunting alignment & padding calculations straight away.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I am unsure if this is a big enough change for its own example. Also, alternatively we hit them with many inputs being unsupported. I am not sure. I would prefer to hear other opinions on this.