EXL2 quantization error / fine-tuning? #70

SinanAkkoyun · 2023-09-28T14:55:42Z

SinanAkkoyun
Sep 28, 2023

Hey!

When quantizing, a quantization error occurs, which is being minimized by the use of calibration data. When quantizing a full-precision model, this is the only option.
However, when fine-tuning with qLora, one can (I think) export the quantized model directly.
Doing this "Quantization aware training" while fine-tuning seems to promise more accurate fine-tuned models than calibrating afterwards.

Let's say one fine-tunes a model with GPTQ qLora and directly exports the quantized model. Would that model perform better than one that got the whole fine-tuning dataset as calibration data only?

I am just very interested in the highest reliability of outputs from quantized models, as I hope that with the right optimization, the error could potentially be eliminated, but I'd like to get external expert optinions on that! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXL2 quantization error / fine-tuning? #70

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

EXL2 quantization error / fine-tuning? #70

SinanAkkoyun Sep 28, 2023

Replies: 0 comments

SinanAkkoyun
Sep 28, 2023