-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantize: CLI command to quantize input model #1356
base: main
Are you sure you want to change the base?
Conversation
ff267a3
to
6f7cadc
Compare
Can you also add a unit test for this? |
Some feedback....
A E2E here would be to run Quantization -> Capture the ONNX Graph for using in ORT. I therefore tried to take the output of quantization and run through Firstly, I tried to use Dynamo exporter option: olive capture-onnx-graph \
--model_name_or_path models/qwen-awq/awq/cpu-cpu_model/model \
--use_dynamo_exporter True \
--use_ort_genai True \
--output_path models/qwen-awq/captured \
--device cpu \
--log_level 1 This hit the following error:
So, next I tried Model Builder option. This worked... BUT it is not clear what really happened - I had to set the |
Ironically, I ran into the same problem initially. I had some thoughts about addressing this at a larger level. Olive already knows the dependencies for each Pass. They are defined in olive_config.json. We could potentially iterate and verify that those dependencies are present even before we start running the passes. That avoids long wait before a workflow fails after running a few expensive passes. Thoughts?
The data_config requirement is removed in a follow up commit. The arguments to the command line are inline with the finetune commend i.e. data_name, train_subset, eval_subset, etc.
There can never be enough information in help. :) One thing could be argued to be more important the other. I propose to provide a link to specific algorithm's documentation.
I will add the available options in the choices list.
As I understand the intent, CLI commands are meant to be "do one job only". For evaluation, we might introduce a separate cli command that user can chain along with this. |
6ec5767
to
7cc7299
Compare
7cc7299
to
4abfed8
Compare
All comments/inputs addressed. |
4abfed8
to
8b75d7b
Compare
e9afcd9
to
f38fa1c
Compare
Usage: olive quantize \ -m <model-name> \ --trust_remote_code \ --device <cpu|gpu|npu> \ --algorithms <awq,gptq> \ --data_name <data-name> \ --train_subset <subset-name> \ --batch_size <batch-size> \ --tempdir <temp-dir> \ -o <output-dir>
f38fa1c
to
1c6f758
Compare
@@ -37,6 +38,7 @@ def get_cli_parser(called_as_console_script: bool = True) -> ArgumentParser: | |||
PerfTuningCommand.register_subcommand(commands_parser) | |||
ConfigureQualcommSDKCommand.register_subcommand(commands_parser) | |||
ManageAMLComputeCommand.register_subcommand(commands_parser) | |||
QuantizeCommand.register_subcommand(commands_parser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you could move this somewhere higher (35?) manage aml and cloud cache don't run workflows so this order feels a bit weird in the documentation.
if is_remote_run(self.args): | ||
return | ||
|
||
output_path = Path(self.args.output_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 410 in 180dffb
def save_output_model(config: Dict, output_model_dir: Union[str, Path]): |
algorithm_name
nesting doesn't exist.
) | ||
|
||
# dataset options | ||
add_dataset_options(sub_parser) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the train/eval names in this doesn't make sense in this scenario. Maybe the option to just have just one split
,subset
or one each for train/eval can be a parameter for this function like in add_input_model_options
mock_tempdir.return_value = tmpdir.resolve() | ||
mock_run.return_value = {} | ||
|
||
for algo_name in algorithm_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to the above comment. this nesting doesn't exist if there is only one pass_flow
Line 250 in 180dffb
A. One pass flow: |
Quantize: CLI command to quantize input model
Usage:
olive quantize --m --device <cpu|gpu> --algorithms <awq,gptq> --data_config_path -o
Few other code improvements:
Checklist before requesting a review
lintrunner -a
(Optional) Issue link