Skip to content

v2.4.9 HGEMM WMMA Stage

Compare
Choose a tag to compare
@DefTruth DefTruth released this 13 Oct 09:15
· 51 commits to main since this release
3acd5e2

What's Changed

  • [HGEMM] Add HGEMM WMMA Double Buffers by @DefTruth in #69
  • [Embedding] Add embedding kernel f32/x4/x4_pack, f16/x8/x8_pack by @bear-zd in #68
  • [HGEMM] Add HGEMM mma4x2, warp2x4x2 kernel by @DefTruth in #70
  • [HGEMM] HGEMM WMMA with Reg double buffers by @DefTruth in #71
  • [HGEMM] Add HGEMM WMMA Stage 3/4 Kernel by @DefTruth in #74
  • [Softmax] Add online softmax f32x4 pack kernel by @bear-zd in #73
  • [HEGMM][Bugfix] fix HGEMM Stage cp.async error by @DefTruth in #75

Full Changelog: v2.4.8...v2.4.9