You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Added
Added support for FP16 Alt Round Near Zero Mode (this feature allows the generation of alternate kernels with intermediate rounding instead of truncation)
Added user-driven solution selection feature
Optimizations
Enabled LocalSplitU with MFMA for I8 data type
Optimized K mask code in mfmaIter
Enabled TailLoop code in NoLoadLoop to prefetch global/local read
Enabled DirectToVgpr in TailLoop for NN, TN, and TT matrix orientations
Optimized DirectToLds test cases to reduce the test duration
Changed
Removed DGEMM NT custom kernels and related test cases
Changed noTailLoop logic to apply noTailLoop only for NT
Changed the range of AssertFree0ElementMultiple and Free1
Unified aStr, bStr generation code in mfmaIter
Fixed
Fixed LocalSplitU mismatch issue for SGEMM
Fixed BufferStore=0 and Ldc != Ldd case
Fixed mismatch issue with TailLoop + MatrixInstB > 1