-
Notifications
You must be signed in to change notification settings - Fork 249
Open
Labels
QuantizationIssues related to Quantization or torchaoIssues related to Quantization or torchaoactionableItems in the backlog waiting for an appropriate impl/fixItems in the backlog waiting for an appropriate impl/fixbugSomething isn't workingSomething isn't working
Description
🚀 The feature, motivation and pitch
In the past, we padded int4 quantization with non-multiple group size to make things work. Since we have decided to remove the padding, int4 quantization is now simply skipped for non-multiple groups. This means, among other things, that int4 quantization is no longer tested because the stories model uses non-multiple-of-256.
Time to load model: 0.19 seconds
Quantizing the model with: {'executor': {'accelerator': 'cuda'}, 'precision': {'dtype': 'bf16'}, 'linear:int4': {'groupsize': 256}}
Skipping quantizing weight with int4 weight only quantization because the shape of weight torch.Size([288, 288]) is not compatible with group_size 256
Skipping quantizing weight with int4 weight only quantization because the shape of weight torch.Size([288, 288]) is not compatible with group_size 256
Skipping quantizing weight with int4 weight only quantization because the shape of weight torch.Size([288, 288]) is not compatible with group_size 256
Skipping quantizing weight with int4 weight only quantization because the shape of weight torch.Size([288, 288]) is not compatible with group_size 256
Some options:
- replace stories with another model that meets the requirement
- add other tests for int4 quantization in tc
Alternatives
Put padding back into int4 quantization.
Yes, it's not ideal, then again, suppressing quantization is not either. In my own experience, just making things work increases utility for end users, if there's real concern about performance (int4 quantization with padding may still beat non-quantization!), pad and issue a warning to users.
Additional context
No response
RFC (Optional)
No response
Metadata
Metadata
Assignees
Labels
QuantizationIssues related to Quantization or torchaoIssues related to Quantization or torchaoactionableItems in the backlog waiting for an appropriate impl/fixItems in the backlog waiting for an appropriate impl/fixbugSomething isn't workingSomething isn't working