Skip to content

Commit

Permalink
Add information about ONNX backend to the docs (openvinotoolkit#23096)
Browse files Browse the repository at this point in the history
### Details:
- The quantize with accuracy control supports the ONNX backend. I have
added information on how to use quantize with accuracy control for
`onnx.ModelProto`.

### Tickets:
 - *133388*

---------

Co-authored-by: Tatiana Savina <[email protected]>
  • Loading branch information
andrey-churkin and tsavina authored Mar 1, 2024
1 parent 7d35d2a commit 06a669e
Show file tree
Hide file tree
Showing 2 changed files with 108 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This is the advanced quantization flow that allows to apply 8-bit quantization t
* Since accuracy validation is run several times during the quantization process, quantization with accuracy control can take more time than the :doc:`Basic 8-bit quantization <basic_quantization_flow>` flow.
* The resulted model can provide smaller performance improvement than the :doc:`Basic 8-bit quantization <basic_quantization_flow>` flow because some of the operations are kept in the original precision.

.. note:: Currently, 8-bit quantization with accuracy control is available only for models in OpenVINO representation.
.. note:: Currently, 8-bit quantization with accuracy control is available only for models in OpenVINO and onnx.ModelProto representation.

The steps for the quantization with accuracy control are described below.

Expand All @@ -38,10 +38,18 @@ This step is similar to the :doc:`Basic 8-bit quantization <basic_quantization_f
:language: python
:fragment: [dataset]

.. tab-item:: ONNX
:sync: onnx

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
:language: python
:fragment: [dataset]

Prepare validation function
############################

Validation function receives ``openvino.CompiledModel`` object and validation dataset and returns accuracy metric value. The following code snippet shows an example of validation function for OpenVINO model:
The validation function takes two arguments: a model object and a validation dataset, and it returns the accuracy metric value. The type of the model object varies for different frameworks. In OpenVINO, it is an ``openvino.CompiledModel``. In ONNX, it is an ``onnx.ModelProto``.
The following code snippet shows an example of a validation function for OpenVINO and ONNX framework:

.. tab-set::

Expand All @@ -52,10 +60,17 @@ Validation function receives ``openvino.CompiledModel`` object and validation da
:language: python
:fragment: [validation]

.. tab-item:: ONNX
:sync: onnx

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
:language: python
:fragment: [validation]

Run quantization with accuracy control
#######################################

``nncf.quantize_with_accuracy_control()`` function is used to run the quantization with accuracy control. The following code snippet shows an example of quantization with accuracy control for OpenVINO model:
``nncf.quantize_with_accuracy_control()`` function is used to run the quantization with accuracy control. The following code snippet shows an example of quantization with accuracy control for OpenVINO and ONNX framework:

.. tab-set::

Expand All @@ -66,6 +81,13 @@ Run quantization with accuracy control
:language: python
:fragment: [quantization]

.. tab-item:: ONNX
:sync: onnx

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
:language: python
:fragment: [quantization]

* ``max_drop`` defines the accuracy drop threshold. The quantization process stops when the degradation of accuracy metric on the validation dataset is less than the ``max_drop``. The default value is 0.01. NNCF will stop the quantization and report an error if the ``max_drop`` value can't be reached.

* ``drop_type`` defines how the accuracy drop will be calculated: ``ABSOLUTE`` (used by default) or ``RELATIVE``.
Expand All @@ -81,6 +103,13 @@ After that the model can be compiled and run with OpenVINO:
:language: python
:fragment: [inference]

.. tab-item:: ONNX
:sync: onnx

.. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
:language: python
:fragment: [inference]

To save the model in the OpenVINO Intermediate Representation (IR), use ``openvino.save_model()``. When dealing with an original model in FP32 precision, it's advisable to preserve FP32 precision in the most impactful model operations that were reverted from INT8 to FP32. To do this, consider using compress_to_fp16=False during the saving process. This recommendation is based on the default functionality of ``openvino.save_model()``, which saves models in FP16, potentially impacting accuracy through this conversion.

.. tab-set::
Expand All @@ -101,6 +130,7 @@ Examples of NNCF post-training quantization with control of accuracy metric:

* `Post-Training Quantization of Anomaly Classification OpenVINO model with control of accuracy metric <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control>`__
* `Post-Training Quantization of YOLOv8 OpenVINO Model with control of accuracy metric <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control>`__
* `Post-Training Quantization of YOLOv8 ONNX Model with control of accuracy metric <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control>`__

See also
####################
Expand Down
75 changes: 75 additions & 0 deletions docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Copyright (C) 2018-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

#! [dataset]
import nncf
import torch

calibration_loader = torch.utils.data.DataLoader(...)

def transform_fn(data_item):
images, _ = data_item
return {input_name: images.numpy()} # input_name should be taken from the model,
# e.g. model.graph.input[0].name

calibration_dataset = nncf.Dataset(calibration_loader, transform_fn)
validation_dataset = nncf.Dataset(calibration_loader, transform_fn)
#! [dataset]

#! [validation]
import numpy as np
import torch
from sklearn.metrics import accuracy_score

import onnx
import onnxruntime


def validate(model: onnx.ModelProto,
validation_loader: torch.utils.data.DataLoader) -> float:
predictions = []
references = []

input_name = model.graph.input[0].name
serialized_model = model.SerializeToString()
session = onnxruntime.InferenceSession(serialized_model, providers=["CPUExecutionProvider"])
output_names = [output.name for output in session.get_outputs()]

for images, target in validation_loader:
pred = session.run(output_names, input_feed={input_name: images.numpy()})[0]
predictions.append(np.argmax(pred, axis=1))
references.append(target)

predictions = np.concatenate(predictions, axis=0)
references = np.concatenate(references, axis=0)
return accuracy_score(predictions, references)
#! [validation]

#! [quantization]
import onnx

model = onnx.load("model_path")

quantized_model = nncf.quantize_with_accuracy_control(
model,
calibration_dataset=calibration_dataset,
validation_dataset=validation_dataset,
validation_fn=validate,
max_drop=0.01,
drop_type=nncf.DropType.ABSOLUTE,
)
#! [quantization]

#! [inference]
import openvino as ov

# convert ONNX model to OpenVINO model
ov_quantized_model = ov.convert_model(quantized_model)

# compile the model to transform quantized operations to int8
model_int8 = ov.compile_model(ov_quantized_model)

input_fp32 = ... # FP32 model input
res = model_int8(input_fp32)

#! [inference]

0 comments on commit 06a669e

Please sign in to comment.