MMLU Performance #67

w013nad started this conversation in Show and tell

w013nad
Sep 27, 2023

I spent some time editing your MMLU test script to compare performance at different quantization levels. A full write-up may be found here.
https://www.reddit.com/r/LocalLLaMA/comments/16tgzzk/exllamav2_performance_with_different_quantization/
If you're interested, I can try to merge my script with yours (I'm unfamiliar with Git). Feel free to use this chart if it's useful.

Quant	Llama-7b	Llama-13b
2.5bpw	0.27114	0.37513
3.0bpw	0.345382	0.505637
3.5bpw	0.408875	0.5207
4.0bpw	0.417009	0.538798
4.7bpw	0.446347	0.545505
5.0bpw	0.455618	0.550132
6.0bpw	0.45636	0.548109
7.0bpw	0.454049	0.547244
8.0bpw	0.447704	0.548172
GPTQ:gs128	0.428763	0.542167

Full results.
https://docs.google.com/spreadsheets/d/1MFmHDpqcf7CP_EYnwl1QsUP0KhS6jA8x1JYuwm3NH0U/edit?usp=sharing

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment