There is not enough said about how much better the quality of inference is with Exl2 and exllamav2 over GGUF. #671

timrohrbaugh · 2024-11-11T23:40:24Z

timrohrbaugh
Nov 11, 2024

Its night a day. The reason I came here was for performance and real production possible engineering choices which as you all know with multiple cards 2x you get 25% additional performance from each card and 4x cards you get 50% for each card with TP set. BUT when you start looking objectively at inference for simulated reasoning the quality difference is crazy. TurboDerp has not been here in a while but I really wanted to just put out there that his work is appreciated.

Originalimoc · 2025-01-26T16:46:30Z

Originalimoc
Jan 26, 2025

It could be even better if you try this commit. #712
Assume you need 4.0, then use a script to quantize to 8.0 and 3.98 3.99 4.0 4.01 4.02(remember to reuse measurements.json) then choose one with closest perplexity to 8.0.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

There is not enough said about how much better the quality of inference is with Exl2 and exllamav2 over GGUF. #671

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

There is not enough said about how much better the quality of inference is with Exl2 and exllamav2 over GGUF. #671

Uh oh!

Uh oh!

timrohrbaugh Nov 11, 2024

Replies: 1 comment

Uh oh!

Uh oh!

Originalimoc Jan 26, 2025

timrohrbaugh
Nov 11, 2024

Originalimoc
Jan 26, 2025