[feature request] lm_head quantization #2550
Labels
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
Recently, vocab_size is getting increased and weight size of lm_head exceeds 10GB in some LLMs.
However, there is no way to quantize lm_head.
modelopt.torch.export.postprocess.update_lm_head_quantization
ignores manual quant_cfg and disable it.update_lm_head_quantization(modelopt==0.19.0)
Related issue: #1394
The text was updated successfully, but these errors were encountered: