Skip to content

Why Is bitnet.cpp Not Running My 1.58-Bit Model at the Expected Speed on CPU? #182

Discussion options

You must be logged in to vote

If you're seeing slower-than-expected inference speeds, it’s likely due to either kernel mismatch or missing optimizations. Make sure you're using the correct quantization type (--quant-type i2_s or tl1) that matches the model you downloaded. Also verify that you're running the latest release of bitnet.cpp with the appropriate pre-tuned kernel parameters (--use-pretuned), as these significantly affect performance. On x86 CPUs, only certain models support the fastest kernel (TL2), so check the kernel compatibility table in the README. Lastly, ensure your CPU supports AVX2 or AVX512 instructions—performance drops sharply without them. Rebuilding with Clang 18+ and running with 4+ threads us…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by pawpatrolrockie
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants