-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[AMD] Hoist Q out of the loop for FA optimization (#4666)
Move writing to LDS and reading from LDS right after the loading of a tensor from global memory. This PR does reordering by considering 2 possible patterns depending on whether writing to LDS is done using an optional local_alloc argument or a local_store instruction: 1) load -> local_alloc -> local_store -> local_load, 2) load -> local_alloc -> local_load. --------- Co-authored-by: Ognjen Plavsic <[email protected]> Co-authored-by: Lei Zhang <[email protected]>
- Loading branch information
1 parent
a63a477
commit e192dba
Showing
2 changed files
with
74 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters