Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

DefTruth / CUDA-Learn-Notes Public

Notifications You must be signed in to change notification settings
Fork 152
Star 1.4k

Code
Issues 1
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: DefTruth/CUDA-Learn-Notes

Releases · DefTruth/CUDA-Learn-Notes

v2.5

05 Nov 02:41

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.5 Latest

Latest

What's Changed

[HGEMM] Update HGEMM README.md by @DefTruth in #120
[HGEMM] Add plot tflops function by @DefTruth in #121
[HGEMM] Add NVIDIA RTX 3090 Laptop perf plot by @DefTruth in #122
[PERF] Update HGEMM benchmark scripts by @DefTruth in #123
[HGEMM] Add HGEMM L20/4090 benchmark figures by @DefTruth in #124
Bump up to v2.5 by @DefTruth in #125

Full Changelog: v2.4.18...v2.5

Contributors

DefTruth

Assets 2

Loading

xq25478, DefTruth, and wangzijian1010 reacted with thumbs up emoji

All reactions

👍 3 reactions

3 people reacted

v2.4.18

01 Nov 01:20

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.18

What's Changed

Update README.md by @DefTruth in #115
[HGEMM] Update HGEMM Supported Matrix by @DefTruth in #116
[HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #117
[README] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #118
[HGEMM] Add NVIDIA RTX 4090 benchmark by @DefTruth in #119

Full Changelog: v2.4.17...v2.4.18

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.4.17

29 Oct 06:39

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.17

What's Changed

[NMS] Add nms f32 cuda kernel. by @bear-zd in #102
[HGEMM] Add some note to collective store by @DefTruth in #103
[HGEMM] Add HGEMM MMA Col Major Kernel by @DefTruth in #104
[HGEMM] Update HGEMM benchmark scripts by @DefTruth in #105
[HGEMM] Add Warp Swizzle as template param by @DefTruth in #106
[HGEMM] add -Xptxas -v compile flag by @DefTruth in #107
[HGEMM] Try reduce registers usage by @DefTruth in #108
[HGEMM] Update HGEMM MMA/WMMA Usage by @DefTruth in #109
[HGEMM][Docs] Add HGEMM Supported Matrix by @DefTruth in #110
[HGEMM] Add M=N=K option for benchmark by @DefTruth in #111
[HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #112
[README] Update HGEMM/SGEMM Supported matrix by @DefTruth in #113
[Docs] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #114

Full Changelog: v2.4.16...v2.4.17

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

HGEMM Warp Swizzle/Reg Buffers

25 Oct 05:59

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

HGEMM Warp Swizzle/Reg Buffers

What's Changed

[HGEMM] HGEMM MMA with Reg Double Buffers by @DefTruth in #99
[HGEMM] ldmatrix.x4.trans with reg double buffers by @DefTruth in #100
[HGEMM] collective store via warp shfl&reg reuse by @DefTruth in #101

Full Changelog: v2.4.15...v2.4.16

Contributors

DefTruth

Assets 2

Loading

wangzijian1010 and DefTruth reacted with rocket emoji

All reactions

🚀 2 reactions

2 people reacted

HGEMM Up to 115 TFLOPS:L20

21 Oct 12:55

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

HGEMM Up to 115 TFLOPS:L20

What's Changed

[HGEMM] Add MMA 16816 swizzle, Up to 115 TFLOPS by @DefTruth in #98

Full Changelog: v2.4.13...v2.4.15

Contributors

DefTruth

Assets 2

Loading

All reactions

HGEMM Up to 113 TFLOPS:L20

21 Oct 01:56

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

HGEMM Up to 113 TFLOPS:L20

What's Changed

[Mat][Trans] Add f32/f32x4 row/col first kernel by @bear-zd in #89
[Docs][Contribute] Add How to contribute Notes by @DefTruth in #90
[HGEMM] optimize SMEM padding, up to 113 TFLOPS by @DefTruth in #92
[Mat][Trans] Add f32x4_shared/bcf row/col first kernel. by @bear-zd in #91
[Docs] rename mat_transpose -> mat-transpose by @DefTruth in #93
[HGEMM] Add GeForce RTX 3080 Laptop benchmark by @DefTruth in #94
[HGEMM] update HGEMM benchmark option by @DefTruth in #95
[HGEMM] Refactor HGEMM WMMA 161616 kernels by @DefTruth in #96
[HGEMM] Update HGEMM WMMA Benchmark by @DefTruth in #97

Full Changelog: v2.4.12...v2.4.13

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

v2.4.12 SGEMM TF32 Swizzle

17 Oct 02:24

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.12 SGEMM TF32 Swizzle

What's Changed

[SGEMM] SGEMM TF32 Thread Block Swizzle by @DefTruth in #84
[HGEMM] mma4x4_warp4x4_stages with swizzle by @DefTruth in #86
[SWISH] support Swish F32/F16 kernel by @wangzijian1010 in #85
[SGEMM] Update SGEMM TF32 Benchmark by @DefTruth in #87

New Contributors

@wangzijian1010 made their first contribution in #85

Full Changelog: v2.4.11...v2.4.12

Contributors

DefTruth and wangzijian1010

Assets 2

Loading

All reactions

v2.4.11 HGEMM Block Swizzle

16 Oct 03:04

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.11 HGEMM Block Swizzle

What's Changed

[Docs] Update README.md by @DefTruth in #81
[HEGMM] HGEMM WMMA Thread Block Swizzle by @DefTruth in #82
[HGEMM] make thread block swizzle stride as N/4 by @DefTruth in #83

Full Changelog: v2.4.10...v2.4.11

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.4.10 SGEMM TF32 Stage 2/3

15 Oct 02:04

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.10 SGEMM TF32 Stage 2/3

What's Changed

[HGEMM] HGEMM WMMA Stage mma4x2+warp4x4 by @DefTruth in #76
[SGEMM] Add SGEMM WMMA TF32 Stage2/3 by @DefTruth in #77
[SGEMM] Add cuBLAS SGEMM F32/TF32 baseline by @DefTruth in #78
[SGEMM] Add Kernel cudaFuncSetAttribute hint by @DefTruth in #79
[RoPE] Add minimal RoPE f32/f32x4 pack impl by @bear-zd in #80

Full Changelog: v2.4.9...v2.4.10

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

v2.4.9 HGEMM WMMA Stage

13 Oct 09:15

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.9 HGEMM WMMA Stage

What's Changed

[HGEMM] Add HGEMM WMMA Double Buffers by @DefTruth in #69
[Embedding] Add embedding kernel f32/x4/x4_pack, f16/x8/x8_pack by @bear-zd in #68
[HGEMM] Add HGEMM mma4x2, warp2x4x2 kernel by @DefTruth in #70
[HGEMM] HGEMM WMMA with Reg double buffers by @DefTruth in #71
[HGEMM] Add HGEMM WMMA Stage 3/4 Kernel by @DefTruth in #74
[Softmax] Add online softmax f32x4 pack kernel by @bear-zd in #73
[HEGMM][Bugfix] fix HGEMM Stage cp.async error by @DefTruth in #75

Full Changelog: v2.4.8...v2.4.9

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

Previous 1 2 3 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.