v2.4.17
What's Changed
- [NMS] Add nms f32 cuda kernel. by @bear-zd in #102
- [HGEMM] Add some note to collective store by @DefTruth in #103
- [HGEMM] Add HGEMM MMA Col Major Kernel by @DefTruth in #104
- [HGEMM] Update HGEMM benchmark scripts by @DefTruth in #105
- [HGEMM] Add Warp Swizzle as template param by @DefTruth in #106
- [HGEMM] add -Xptxas -v compile flag by @DefTruth in #107
- [HGEMM] Try reduce registers usage by @DefTruth in #108
- [HGEMM] Update HGEMM MMA/WMMA Usage by @DefTruth in #109
- [HGEMM][Docs] Add HGEMM Supported Matrix by @DefTruth in #110
- [HGEMM] Add M=N=K option for benchmark by @DefTruth in #111
- [HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #112
- [README] Update HGEMM/SGEMM Supported matrix by @DefTruth in #113
- [Docs] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #114
Full Changelog: v2.4.16...v2.4.17