Skip to content

Commit

Permalink
[HGEMM] Add MMA 16816 swizzle, Up to 115 TFLOPS (#98)
Browse files Browse the repository at this point in the history
* Update hgemm_mma.cu

* Update README.md

* Update hgemm.py

* Update hgemm.cu

* Update hgemm_mma.cu

* Update hgemm.cu

* Update hgemm.py

* Update README.md

* Update hgemm_mma.cu

* Update hgemm.py

* Update hgemm.cu

* Update hgemm_mma.cu

* Update README.md

* Update hgemm.py

* Update README.md

* Update README.md

* Update hgemm_mma_stage.cu

* Update hgemm.py

* Update hgemm.cu

* Update README.md

* Update README.md

* Update hgemm_mma_stage.cu

* Update hgemm_mma_stage.cu
  • Loading branch information
DefTruth authored Oct 21, 2024
1 parent 0aeb450 commit a2934b9
Show file tree
Hide file tree
Showing 6 changed files with 1,247 additions and 314 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@
| ✔️ [hgemm_wmma_m32n8k16....dbuf*](./hgemm/hgemm_wmma.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
| ✔️ [hgemm_wmma_m16n16k16...stages*](./hgemm/hgemm_wmma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
| ✔️ [hgemm_wmma_m16n16k16...swizzle*](./hgemm/hgemm_wmma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
| ✔️ [hgemm_mma_m16n8k16...naive*](./hgemm/hgemm_mma.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
| ✔️ [hgemm_mma_m16n8k16...mma2x4*](./hgemm/hgemm_mma.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
| ✔️ [hgemm_mma_m16n8k16...stages*](./hgemm/hgemm_mma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
| ✔️ [hgemm_mma_m16n8k16...swizzle*](./hgemm/hgemm_mma_stage.cu)|f16|f16|[link](./hgemm/)|⭐️⭐️⭐️|
| ✔️ [sgemv_k32_f32](./sgemv/sgemv.cu)|f32|f32|[link](./sgemv/)|⭐️⭐️⭐️|
| ✔️ [sgemv_k128_f32x4](./sgemv/sgemv.cu)|f32|f32|[link](./sgemv/)|⭐️⭐️⭐️|
| ✔️ [sgemv_k16_f32](./sgemv/sgemv.cu)|f32|f32|[link](./sgemv/)|⭐️⭐️⭐️|
Expand All @@ -158,7 +162,7 @@
| ✔️ [hard_nms cpp only](./nms/nms.cc)|f32|/|/|⭐️|
| ✔️ [notes v1(deprecated)](./notes-v1.cu)|f32|f32|/|⭐️|

👉TIPS: * means using **Tensor Cores(MMA PTX)**, otherwise, using CUDA Cores by default.
👉TIPS: * means using **Tensor Cores(MMA/WMMA)**, otherwise, using CUDA Cores by default.

## 0x01 📖 博客目录

Expand Down
Loading

0 comments on commit a2934b9

Please sign in to comment.