Skip to content

Latest commit

 

History

History
200 lines (143 loc) · 14.6 KB

README.md

File metadata and controls

200 lines (143 loc) · 14.6 KB

GPU MODE Resource Stream

https://discord.gg/gpumode

Here you find a collection of CUDA related material (books, papers, blog-post, youtube videos, tweets, implementations etc.). We also collect information to higher level tools for performance optimization and kernel development like Triton and torch.compile() ... whatever makes the GPUs go brrrr.

You know a great resource we should add? Please see How to contribute.

Lectures / Reading Group Live Sessions

You find a list of upcoming lectures in the Events option in the channel list (side bar) of our discord server.

Recordings of the weekly lectures are published on our YouTube channel. Material (code, slides) for the individual lectures can be found in the lectures repository.

1st Contact with CUDA

2nd Contact

Hazy Research

The MLSys-oriented research group at Stanford led by Chris Re, with alumni Tri Dao, Dan Fu, and many others. A goldmine.

Papers, Case Studies

Books

Cuda Courses

CUDA Grandmasters

Tri Dao

Tim Dettmers

Sasha Rush

Practice

PyTorch Performance Optimization

PyTorch Internals & Debugging

Code / Libs

Essentials

Profiling

Python GPU Computing

Advanced Topics, Research, Compilers

News

Technical Blog Posts

Hardware Architecture

GPU-MODE Community Projects

ring-attention

pscan

Triton Kernels / Examples

  • unsloth that implements custom kernels in Triton for faster QLoRA training
  • Custom implementation of relative position attention (link)
  • Tri Dao's Triton implementation of Flash Attention: flash_attn_triton.py
  • YouTube playlist: Triton Conference 2023
  • LightLLM with different triton kernels for different LLMs

How to contribute

To share interesting CUDA related links please create a pull request for this file. See editing files in the github documentation.

Or contact us on the GPU MODE discord server: https://discord.gg/gpumode