[Feature Request] Better support for w4a8 quantization #2605

ShuaiShao93 · 2024-12-20T20:45:57Z

Based on this doc, we have to use deepcompressor to build prepare fake-quantized checkpoint. However, it's a lot more trouble to set up that repo and it seems to me that tool is not being maintained well, especially for new llama models like 3.1/3.2. At least I was not able to do it successfully for llama 3.1 8b.

It would be great if we can add more native support for w4a8 quantization in the trtllm.

nv-guomingz · 2024-12-23T06:10:15Z

@Barry-Delaney would u plz add comments here?

bobboli · 2024-12-23T06:50:20Z

Hi,

Our officially supported toolkit for quantization is ModelOpt. We have discussed before and found that it is not trivial to land the techniques used by DeepCompressor (such as asymmetric quantization, double quantization, rotation, smoothing etc) into ModelOpt. At least in the near future we need to rely on DeepCompressor.

If you find problems on quantizing new models, could you try to do some implementation as DeepCompressor has an abstraction layer for models like this? You could also raise an issue in the DeepCompressor library and paste the errors in detail. The authors of DeepCompressor would be glad to answer.

Thank you!

ShuaiShao93 · 2024-12-23T16:59:52Z

Thanks! I have filed mit-han-lab/deepcompressor#38.

BTW if we can fix these issues: #2602, #2603, #2604, we can at least use w8a8. But today we can't even use w8a8.

nv-guomingz assigned Barry-Delaney Dec 23, 2024

nv-guomingz added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Dec 23, 2024

github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Better support for w4a8 quantization #2605

[Feature Request] Better support for w4a8 quantization #2605

ShuaiShao93 commented Dec 20, 2024

nv-guomingz commented Dec 23, 2024

bobboli commented Dec 23, 2024

ShuaiShao93 commented Dec 23, 2024

[Feature Request] Better support for w4a8 quantization #2605

[Feature Request] Better support for w4a8 quantization #2605

Comments

ShuaiShao93 commented Dec 20, 2024

nv-guomingz commented Dec 23, 2024

bobboli commented Dec 23, 2024

ShuaiShao93 commented Dec 23, 2024