- Menlo Park, CA
Stars
Minimalistic 4D-parallelism distributed training framework for education purpose
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Github mirror of trition-lang/triton repo.
Tile primitives for speedy kernels
Fast and memory-efficient exact attention
Custom kernels in Triton language for accelerating LLMs
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Test and benchmark suite for sort implementations.
Commented (but unaltered) version of original word2vec C implementation.
A curated list of awesome System Design (A.K.A. Distributed Systems) resources.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
A curated list to learn about distributed systems
Mcrouter is a memcached protocol router for scaling memcached deployments.