GPU Arch Microbenchmark

Prerequisites

install turingas compiler

git clone [email protected]:daadaada/turingas.git
python setup.py install

Usage

mkdir build && cd build
cmake .. && make
python ../compile_sass.py -arch=<70|75|80>

Microbenchmark

1. Latency

Device		Turing RTX-2070
Global Latency	cycle	TBD
L2 Latency	cycle	236
L1 Latency	cycle	32
Shared Latency	cycle	23
Constant Latency	cycle	448
Constant L2 Latency	cycle	62
Constant L1 Latency	cycle	4

const L1-cache is as fast as register.

2. Cache Linesize

Device		Turing RTX-2070
L2 Linesise	bytes	64
L1 Linesize	bytes	32
Constant L2 Linesise	bytes	256
Constant L1 Linesize	bytes	32

3. Reg Bankconflict

Instruction		conflict	without conflict
FFMA	CPI	1.758	1.484

4. Shared Bankconflict

Memory Load		Turing RTX-2070
Single	cycle	23
Vector2 X 2	cycle	27
Conflict Strided	cycle	41
Conlict-Free Strided	cycle	32

Citation

Jia, Zhe, et al. "Dissecting the NVIDIA volta GPU architecture via microbenchmarking." arXiv preprint arXiv:1804.06826 (2018).
Jia, Zhe, et al. "Dissecting the NVidia Turing T4 GPU via microbenchmarking." arXiv preprint arXiv:1903.07486 (2019).
Yan, Da, Wei Wang, and Xiaowen Chu. "Optimizing batched winograd convolution on GPUs." Proceedings of the 25th ACM SIGPLAN symposium on principles and practice of parallel programming. 2020. (turingas)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
memory		memory
miscellany		miscellany
sass_cubin		sass_cubin
schedule		schedule
utils		utils
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
compile_sass.py		compile_sass.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Arch Microbenchmark

Prerequisites

Usage

Microbenchmark

1. Latency

2. Cache Linesize

3. Reg Bankconflict

4. Shared Bankconflict

Citation

About

Releases

Packages

Languages

edisonchan/gpu-arch-microbenchmark

Folders and files

Latest commit

History

Repository files navigation

GPU Arch Microbenchmark

Prerequisites

Usage

Microbenchmark

1. Latency

2. Cache Linesize

3. Reg Bankconflict

4. Shared Bankconflict

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages