Skip to content

tile-ai/tilelang-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tile Lang Benchmark

Benchmark Summary

TileLang achieves exceptional performance across a variety of computational patterns. Below are selected results showcasing its capabilities:

  • Flash Attention Performance on H100

    operator performance on H100
  • Matmul Performance on GPUs (RTX 4090, A100, H100, MI300X)

    gemm fp16 performance on Gpus
  • Dequantize Matmul Performance on A100

    dequantize gemv performance on A100

Benchmark OP Set


Table 1: Matrix shapes in our benchmark

V0 V1 V2 V3 V4 V5 V6 V7
m 1 1 1 1 1 1 1 1
n 16384 43008 14336 57344 14336 9216 36864 9216
k 16384 14336 14336 14336 57344 9216 9216 36864
M0 M1 M2 M3 M4 M5 M6 M7
m 4096 4096 4096 4096 8192 8192 8192 8192
n 1024 8192 28672 8192 1024 8192 28672 8192
k 8192 8192 8192 28672 8192 8192 8192 28672

Table 2: FlashAttention shapes in our benchmark

FA0 FA1 FA2 FA3 FA4
batch 1 1 1 1 1
nheads 32 32 32 32 32
seq_len 512 512 1024 1024 4096
head_dim 128 128 128 128 128
causal true false true false true

Table 3: Linear Attention shapes in our benchmark

CC0 CC1 CC2 CC3 CC4 CC5
batch 1 1 1 64 64 64
nheads 64 64 64 64 64 64
seq_len 1024 2048 8192 1024 2048 8192
head_dim 64 64 64 64 64 64
d_state 128 128 128 128 128 128
CT0 CT1 CT2 CT3 CT4 CT5
batch 1 1 1 64 64 64
nheads 64 64 64 64 64 64
seq_len 1024 2048 8192 1024 2048 8192
head_dim 64 64 64 64 64 64
d_state 128 128 128 128 128 128

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •