Olive-ai 0.5.0
Examples
The following examples are added:
- Audio Spectrogram Transformer optimization #762
- Bert SNPE #925
- Llama2 GenAI #940
- Llama2 notebook turorial #798
- MobileNet optimization with QDQ Quantization on Qualcomm NPU #874
- Phi2 Generation #979
- Phi2 optimization with different precision #938
- Stable Diffusion OpenVINO example #853
Passes (optimization techniques)
New Passes
- PyTorch
- Introduce GenAIModelExporter pass to export a PyTorch model using GenAI exporter.
- Introduce LoftQ pass which performs model fine-tuning using the LoftQ initialization proposed in https://arxiv.org/abs/2310.08659.
- ONNXRuntime
- Introduce DynamicToFixedShape pass to convert dynamic shape to fixed shape for ONNX model.
- Introduce OnnxOpVersionConversion pass to convert an existing ONNX model with another target opset.
- [QNN-EP] Add the option of
prepare_qnn_config:bool
for quantization under QNN-EP where the int16/uint16 are supported both for weights and activation. - [QNN-EP] Introduce QNNPreprocess pass to preprocess the model before quantization.
- QNN
- Introduce QNNConversion pass to convert models to QNN C++ model.
- Introduce QNNContextBinaryGenerator pass to generate the context binary from a compiled model library using a specific backend.
- Introduce QNNModelLibGenerator pass to compile the C++ model into a model library for the desired target.
Updates
- OnnxConversion
- Support both
past_key_values.index.key/value
andpast_key_value.index
.
- Support both
- OptimumConversion
- Provide parameter
components
if the user wants to export only some models such asdecoder_model
anddecoder_with_past_model
. - Uses the default exporter args and behavior of the underlying optimum version. For versions 1.14.0+, this means
legacy=False
andno_post_process=False
. User must provide them usingextra_args
if legacy behavior is desired.
- Provide parameter
- OpenVINO
- Upgrade OpenVINO API to 2023.2.0.
- OrtPerTuning
- Add
tunable_op_enable
andtunable_op_tuning_enable
for ROCM ep to speed up the performance.
- Add
- LoRA/QLoRA
- Support bfloat16 with ort-training.
- Support resuming training from checkpoint by
resume_from_checkpoint
option.overwrite_output_dir
option.
- MoEExpertsDistributor
- Add option to configure number of parallel jobs.
Engine
- As for Zipfile packaging, add models rank json file. This file ranks all output models from different EPs. This json file includes model_config and metrics.
- Add Auto Optimizer which is a tool that can be used to automatically search Olive passes combination.
System
- Add
hf_token
support for Olive systems. - AzureMLSystem
- Olive config file will be uploaded to AML jobs under codes folder.
- Support adding tags to the AML jobs.
- Support using existing AML workspace Environment for AzureMLSystem.
- DockerSystem
- Support running Olive Pass.
PythonEnvironmentSystem
requires Olive to be installed in the environment. It can run passes and evaluate models.- New
IsolatedORTSystem
introduced that only supports evaluation of ONNX models. It requires onnxruntime to be installed in the environment. Can be used to for packages like onnxruntime-qnn which can only be run on Windows ARM64 python environment.
Data
- Add AML resource support for data configs.
- Add audio classification data preprocess function.
Model
- Rename
model_loading_args
tofrom_pretrained_args
inhf_config
.
Metrics
- Add
throughput
metric support.
Dependencies:
Support onnxruntime 1.17.1.