Replies: 2 comments 7 replies
-
I have lots of plans for improved quantization, I just keep getting pulled in different directions. :) Latest distraction is Phi-3, but there are other issues that keep coming up. Quantization-aware finetuning I don't think I'd look at until I've experimented enough with all the most recent post-training quantization schemes. It's a very heavy-handed approach and I don't think it's much of an exaggeration to say it's too slow for the rate at which models are being released right now. It simply can't take two weeks and cost a significant amount of money to quantize a model because no one will care about that model by the time you're done. |
Beta Was this translation helpful? Give feedback.
-
Hi all, Thank you for your interest in the VPTQ project. I am also actively trying to integrate VPTQ https://github.com/microsoft/VPTQ into various inference frameworks. What would be required for me to integrate it into this project? Should I prepare a pull request? Yang |
Beta Was this translation helpful? Give feedback.
-
Any plans for variant post-training quantization for "heal" perplexity from
1-bit
BitNetto the current4-bit
perplexity.post-training quantization methods such as
AQLM
,Smoothquant+
, andSqueezeLLM
.Variant quantization as most method require significantly more VRAM, and time for sft.
Or perhaps my prediction is wrong, and you have other v3 plan?
Beta Was this translation helpful? Give feedback.
All reactions