There is not enough said about how much better the quality of inference is with Exl2 and exllamav2 over GGUF. #671
timrohrbaugh
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Its night a day. The reason I came here was for performance and real production possible engineering choices which as you all know with multiple cards 2x you get 25% additional performance from each card and 4x cards you get 50% for each card with TP set. BUT when you start looking objectively at inference for simulated reasoning the quality difference is crazy. TurboDerp has not been here in a while but I really wanted to just put out there that his work is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions