There is not enough said about how much better the quality of inference is with Exl2 and exllamav2 over GGUF. #671
timrohrbaugh
started this conversation in
General
Replies: 1 comment
-
It could be even better if you try this commit. #712 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Its night a day. The reason I came here was for performance and real production possible engineering choices which as you all know with multiple cards 2x you get 25% additional performance from each card and 4x cards you get 50% for each card with TP set. BUT when you start looking objectively at inference for simulated reasoning the quality difference is crazy. TurboDerp has not been here in a while but I really wanted to just put out there that his work is appreciated.
Beta Was this translation helpful? Give feedback.
All reactions