-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD] Enable masked load and pointer canonicalization pass #4638
Conversation
Can you give a concrete example about this effect? |
On my local machine (gfx11 card), if I run:
So a 13% reduction on number of registers for the best config. |
I also want to underline that this PR is blocked by a recent PR #4369 . We will need to add |
64cf130
to
b78315e
Compare
Upon further thought, I sank the emission of |
b78315e
to
f70434a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks! Just two small nits.
Could you fix the test and update the pull request message too? |
Unfortunately the issue with the test revealed something more problematic of the pass. I might do a separate PR about this. |
Hi @antiagainst , this is the PR that fixes the issue: #4678 |
f70434a
to
6e39106
Compare
Hi @antiagainst , I rebased, fixed the nits and also addressed the fact that we are now using |
6e39106
to
b209aab
Compare
…the if-else branch
Hi @antiagainst , could you start the tests again ? |
This PR is doing two things:
llvm.masked{load/store}
intrinsics. This means that the backend will take responsibility to lower the stores/loads.The reason why I am enabling both at the same time is because I saw a minor regression with
llvm.masked{load,store}
which seems to go away when using the pointer canonicalization. Also, this combination seems to reduce the numbers of vgprs used (at least for GEMM kernels).