Add information about ONNX backend to the docs (openvinotoolkit#23096)

### Details: - The quantize with accuracy control supports the ONNX backend. I have added information on how to use quantize with accuracy control for `onnx.ModelProto`. ### Tickets: - *133388* --------- Co-authored-by: Tatiana Savina <[email protected]>
kumaripa-13 · Mar 1, 2024 · 06a669e · 06a669e
1 parent 7d35d2a
commit 06a669e
Show file tree

Hide file tree

Showing 2 changed files with 108 additions and 3 deletions.
diff --git a/...tion-guide/quantizing-models-post-training/quantizing-with-accuracy-control.rst b/...tion-guide/quantizing-models-post-training/quantizing-with-accuracy-control.rst
@@ -14,7 +14,7 @@ This is the advanced quantization flow that allows to apply 8-bit quantization t
 * Since accuracy validation is run several times during the quantization process, quantization with accuracy control can take more time than the :doc:`Basic 8-bit quantization <basic_quantization_flow>` flow.
 * The resulted model can provide smaller performance improvement than the :doc:`Basic 8-bit quantization <basic_quantization_flow>` flow because some of the operations are kept in the original precision.
 
-.. note:: Currently, 8-bit quantization with accuracy control is available only for models in OpenVINO representation.
+.. note:: Currently, 8-bit quantization with accuracy control is available only for models in OpenVINO and onnx.ModelProto representation.
 
 The steps for the quantization with accuracy control are described below.
 
@@ -38,10 +38,18 @@ This step is similar to the :doc:`Basic 8-bit quantization <basic_quantization_f
          :language: python
          :fragment: [dataset]
 
+   .. tab-item:: ONNX
+      :sync: onnx
+
+      .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
+         :language: python
+         :fragment: [dataset]
+
 Prepare validation function
 ############################
 
-Validation function receives ``openvino.CompiledModel`` object and validation dataset and returns accuracy metric value. The following code snippet shows an example of validation function for OpenVINO model:
+The validation function takes two arguments: a model object and a validation dataset, and it returns the accuracy metric value. The type of the model object varies for different frameworks. In OpenVINO, it is an ``openvino.CompiledModel``. In ONNX, it is an ``onnx.ModelProto``.
+The following code snippet shows an example of a validation function for OpenVINO and ONNX framework:
 
 .. tab-set::
 
@@ -52,10 +60,17 @@ Validation function receives ``openvino.CompiledModel`` object and validation da
          :language: python
          :fragment: [validation]
 
+   .. tab-item:: ONNX
+      :sync: onnx
+
+      .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
+         :language: python
+         :fragment: [validation]
+
 Run quantization with accuracy control
 #######################################
 
-``nncf.quantize_with_accuracy_control()`` function is used to run the quantization with accuracy control. The following code snippet shows an example of quantization with accuracy control for OpenVINO model:
+``nncf.quantize_with_accuracy_control()`` function is used to run the quantization with accuracy control. The following code snippet shows an example of quantization with accuracy control for OpenVINO and ONNX framework:
 
 .. tab-set::
 
@@ -66,6 +81,13 @@ Run quantization with accuracy control
          :language: python
          :fragment: [quantization]
 
+   .. tab-item:: ONNX
+      :sync: onnx
+
+      .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
+         :language: python
+         :fragment: [quantization]
+
 * ``max_drop`` defines the accuracy drop threshold. The quantization process stops when the degradation of accuracy metric on the validation dataset is less than the ``max_drop``. The default value is 0.01. NNCF will stop the quantization and report an error if the ``max_drop`` value can't be reached.
 
 * ``drop_type`` defines how the accuracy drop will be calculated: ``ABSOLUTE`` (used by default) or ``RELATIVE``.
@@ -81,6 +103,13 @@ After that the model can be compiled and run with OpenVINO:
          :language: python
          :fragment: [inference]
 
+   .. tab-item:: ONNX
+      :sync: onnx
+
+      .. doxygensnippet:: docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
+         :language: python
+         :fragment: [inference]
+
 To save the model in the OpenVINO Intermediate Representation (IR), use ``openvino.save_model()``. When dealing with an original model in FP32 precision, it's advisable to preserve FP32 precision in the most impactful model operations that were reverted from INT8 to FP32. To do this, consider using compress_to_fp16=False during the saving process. This recommendation is based on the default functionality of ``openvino.save_model()``, which saves models in FP16, potentially impacting accuracy through this conversion.
 
 .. tab-set::
@@ -101,6 +130,7 @@ Examples of NNCF post-training quantization with control of accuracy metric:
 
 * `Post-Training Quantization of Anomaly Classification OpenVINO model with control of accuracy metric <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control>`__
 * `Post-Training Quantization of YOLOv8 OpenVINO Model with control of accuracy metric <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control>`__
+* `Post-Training Quantization of YOLOv8 ONNX Model with control of accuracy metric <https://github.com/openvinotoolkit/nncf/blob/develop/examples/post_training_quantization/onnx/yolov8_quantize_with_accuracy_control>`__
 
 See also
 ####################

diff --git a/docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py b/docs/optimization_guide/nncf/ptq/code/ptq_aa_onnx.py
@@ -0,0 +1,75 @@
+# Copyright (C) 2018-2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+#! [dataset]
+import nncf
+import torch
+
+calibration_loader = torch.utils.data.DataLoader(...)
+
+def transform_fn(data_item):
+    images, _ = data_item
+    return {input_name: images.numpy()} # input_name should be taken from the model, 
+                                        # e.g. model.graph.input[0].name
+
+calibration_dataset = nncf.Dataset(calibration_loader, transform_fn)
+validation_dataset = nncf.Dataset(calibration_loader, transform_fn)
+#! [dataset]
+
+#! [validation]
+import numpy as np
+import torch
+from sklearn.metrics import accuracy_score
+
+import onnx
+import onnxruntime
+
+
+def validate(model: onnx.ModelProto,
+             validation_loader: torch.utils.data.DataLoader) -> float:
+    predictions = []
+    references = []
+
+    input_name = model.graph.input[0].name
+    serialized_model = model.SerializeToString()
+    session = onnxruntime.InferenceSession(serialized_model, providers=["CPUExecutionProvider"])
+    output_names = [output.name for output in session.get_outputs()]
+
+    for images, target in validation_loader:
+        pred = session.run(output_names, input_feed={input_name: images.numpy()})[0]
+        predictions.append(np.argmax(pred, axis=1))
+        references.append(target)
+
+    predictions = np.concatenate(predictions, axis=0)
+    references = np.concatenate(references, axis=0)
+    return accuracy_score(predictions, references)
+#! [validation]
+
+#! [quantization]
+import onnx
+
+model = onnx.load("model_path")
+
+quantized_model = nncf.quantize_with_accuracy_control(
+    model,
+    calibration_dataset=calibration_dataset,
+    validation_dataset=validation_dataset,
+    validation_fn=validate,
+    max_drop=0.01,
+    drop_type=nncf.DropType.ABSOLUTE,
+)
+#! [quantization]
+
+#! [inference]
+import openvino as ov
+
+# convert ONNX model to OpenVINO model
+ov_quantized_model = ov.convert_model(quantized_model)
+
+# compile the model to transform quantized operations to int8
+model_int8 = ov.compile_model(ov_quantized_model)
+
+input_fp32 = ... # FP32 model input
+res = model_int8(input_fp32)
+
+#! [inference]