HfModel: Disable `use_exllama` by default for GPTQ models #1474

jambayk · 2024-11-09T00:49:30Z

Describe your changes

The default value for use_exllama in transformers is True. However, exllama model cannot be loaded on cpu (for model export) and doesn't have a backward pass implemented for finetuning.
Since the main use for gptq quantized model in Olive is for export and finetuning, we should disable use_exllama by default. User can provide use_exllama=True as part of the loading args if they want to enable exllama for inference, etc.

Checklist before requesting a review

Add unit tests for this change.
Make sure all tests can pass.
Update documents if necessary.
Lint and apply fixes to your code by running lintrunner -a
Is this a user-facing change? If yes, give a description of this change to be included in the release notes.
Is this PR including examples changes? If yes, please remember to update example documentation in a follow-up PR.

(Optional) Issue link

fix test

for use_exllama to False

dcfe3a7

fix test

jambayk force-pushed the jambayk/dont-use-exllama branch from b53053d to dcfe3a7 Compare November 9, 2024 01:38

xiaoyu-work approved these changes Nov 11, 2024

View reviewed changes

jambayk merged commit 4009def into main Nov 11, 2024
25 checks passed

jambayk deleted the jambayk/dont-use-exllama branch November 11, 2024 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HfModel: Disable `use_exllama` by default for GPTQ models #1474

HfModel: Disable `use_exllama` by default for GPTQ models #1474

jambayk commented Nov 9, 2024

HfModel: Disable use_exllama by default for GPTQ models #1474

HfModel: Disable use_exllama by default for GPTQ models #1474

Conversation

jambayk commented Nov 9, 2024

Describe your changes

Checklist before requesting a review

(Optional) Issue link

HfModel: Disable `use_exllama` by default for GPTQ models #1474

HfModel: Disable `use_exllama` by default for GPTQ models #1474