Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Better support for w4a8 quantization #2605

Open
ShuaiShao93 opened this issue Dec 20, 2024 · 3 comments
Open

[Feature Request] Better support for w4a8 quantization #2605

ShuaiShao93 opened this issue Dec 20, 2024 · 3 comments
Assignees
Labels
Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers

Comments

@ShuaiShao93
Copy link

Based on this doc, we have to use deepcompressor to build prepare fake-quantized checkpoint. However, it's a lot more trouble to set up that repo and it seems to me that tool is not being maintained well, especially for new llama models like 3.1/3.2. At least I was not able to do it successfully for llama 3.1 8b.

It would be great if we can add more native support for w4a8 quantization in the trtllm.

@nv-guomingz nv-guomingz added the Low Precision Issue about lower bit quantization, including int8, int4, fp8 label Dec 23, 2024
@nv-guomingz
Copy link
Collaborator

@Barry-Delaney would u plz add comments here?

@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 23, 2024
@bobboli
Copy link
Collaborator

bobboli commented Dec 23, 2024

Hi,

Our officially supported toolkit for quantization is ModelOpt. We have discussed before and found that it is not trivial to land the techniques used by DeepCompressor (such as asymmetric quantization, double quantization, rotation, smoothing etc) into ModelOpt. At least in the near future we need to rely on DeepCompressor.

If you find problems on quantizing new models, could you try to do some implementation as DeepCompressor has an abstraction layer for models like this? You could also raise an issue in the DeepCompressor library and paste the errors in detail. The authors of DeepCompressor would be glad to answer.

Thank you!

@ShuaiShao93
Copy link
Author

Thanks! I have filed mit-han-lab/deepcompressor#38.

BTW if we can fix these issues: #2602, #2603, #2604, we can at least use w8a8. But today we can't even use w8a8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants