[Feature Request] Better support for w4a8 quantization #2605
Labels
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
Based on this doc, we have to use deepcompressor to build prepare fake-quantized checkpoint. However, it's a lot more trouble to set up that repo and it seems to me that tool is not being maintained well, especially for new llama models like 3.1/3.2. At least I was not able to do it successfully for llama 3.1 8b.
It would be great if we can add more native support for w4a8 quantization in the trtllm.
The text was updated successfully, but these errors were encountered: