v2.4.9 HGEMM WMMA Stage
What's Changed
- [HGEMM] Add HGEMM WMMA Double Buffers by @DefTruth in #69
- [Embedding] Add embedding kernel f32/x4/x4_pack, f16/x8/x8_pack by @bear-zd in #68
- [HGEMM] Add HGEMM mma4x2, warp2x4x2 kernel by @DefTruth in #70
- [HGEMM] HGEMM WMMA with Reg double buffers by @DefTruth in #71
- [HGEMM] Add HGEMM WMMA Stage 3/4 Kernel by @DefTruth in #74
- [Softmax] Add online softmax f32x4 pack kernel by @bear-zd in #73
- [HEGMM][Bugfix] fix HGEMM Stage cp.async error by @DefTruth in #75
Full Changelog: v2.4.8...v2.4.9