EXL2 quantization error / fine-tuning? #70
Unanswered
SinanAkkoyun
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey!
When quantizing, a quantization error occurs, which is being minimized by the use of calibration data. When quantizing a full-precision model, this is the only option.
However, when fine-tuning with qLora, one can (I think) export the quantized model directly.
Doing this "Quantization aware training" while fine-tuning seems to promise more accurate fine-tuned models than calibrating afterwards.
Let's say one fine-tunes a model with GPTQ qLora and directly exports the quantized model. Would that model perform better than one that got the whole fine-tuning dataset as calibration data only?
I am just very interested in the highest reliability of outputs from quantized models, as I hope that with the right optimization, the error could potentially be eliminated, but I'd like to get external expert optinions on that! :)
Beta Was this translation helpful? Give feedback.
All reactions