Add a document for quantization on NNPA #3045

tungld · 2025-01-16T04:56:58Z

This PR adds a document file docs/Quantization-NNPA.md to explain how to use quantization on NNPA.

Signed-off-by: Tung D. Le <[email protected]>

tungld · 2025-01-16T12:45:00Z

@jenkins-droid test this please

Sunny-Anand

LGTM, just have a few questions.

docs/Quantization-NNPA.md

Sunny-Anand · 2025-01-16T16:23:01Z

docs/Quantization-NNPA.md

+- supports per-tensor dynamic quantization, and
+- quantizes data tensors from float32 to 8-bit signed integer because NNPA supports 8-bit signed integers. If a data tensor in the input model is already in 8-bit singed integer, the compiler will not quantize it again.
+
+The compiler provides two compile flags for quantizing a model at compile time:


Since these flags are targetting only dynamic quantization, shall we specify dynamically quantizing here?

maybe we could use --nnpa-dquant for dynamic quantization, and use it systematically for both options.

Sunny-Anand · 2025-01-16T16:25:15Z

docs/Quantization-NNPA.md

+
+# Performance notes
+
+It is often the case that symmetric quantization leads to better inference performance but poorer accuracy than asymetric quantization.


Typo: asymmetric

AlexandreEichenberger

LGTM, will let the final word to @Sunny-Anand

docs/Quantization-NNPA.md

AlexandreEichenberger · 2025-01-16T19:00:25Z

docs/Quantization-NNPA.md

+- supports per-tensor dynamic quantization, and
+- quantizes data tensors from float32 to 8-bit signed integer because NNPA supports 8-bit signed integers. If a data tensor in the input model is already in 8-bit singed integer, the compiler will not quantize it again.
+
+The compiler provides two compile flags for quantizing a model at compile time:


maybe we could use --nnpa-dquant for dynamic quantization, and use it systematically for both options.

Signed-off-by: Tung D. Le <[email protected]>

tungld · 2025-01-17T03:21:03Z

@Sunny-Anand @AlexandreEichenberger could you take another look at my new changes based on your comments? Would like to make sure everything is clear before merging. Thanks!

Sunny-Anand

LGTM. Thanks for the changes.

AlexandreEichenberger · 2025-01-17T16:50:33Z

docs/Quantization-NNPA.md

+
+# Overview 
+
+NNPA in IBM Telum II supports 8-bit signed-integer quantized matrix multiplications. This document shows how to compile an ONNX model for quantization on NNPA.


There seems to be no consensus on what quantization means. It always means going from a higher precision to a lower precision, but I don't think it necessarily implies integer representation. See here for example

https://huggingface.co/docs/optimum/en/concept_guides/quantization

So maybe we could be a bit clearer here.

=====

NNPA in IBM Telum II supports 8-bit signed-integer quantized matrix multiplications. This document shows how to compile an ONNX model for 8-bit quantization on NNPA. When not following these steps, models will still be accelerated when targeting Telum systems using a mixture of 16-bit floating-point numbers for computations mapped to the Telum's Integrated AI accelerator and 32-bit floating-point numbers for computations mapped to the Telum CPUs.

=====

I think that once this is out of the way, we may continue having the text below without changes. Or one could use "8-bit integer quantization" once at the beginning of sections.

Thanks! I updated it with your content.

Signed-off-by: Tung D. Le <[email protected]>

jenkins-droid · 2025-01-20T03:46:36Z

Jenkins Linux ppc64le Build #15164 [push] Add a document for quant... started at 23:04

jenkins-droid · 2025-01-20T03:47:31Z

Jenkins Linux s390x Build #16138 [push] Add a document for quant... started at 22:47

jenkins-droid · 2025-01-20T03:47:33Z

Jenkins Linux amd64 Build #16136 [push] Add a document for quant... started at 21:47

jenkins-droid · 2025-01-20T05:12:43Z

Jenkins Linux amd64 Build #16136 [push] Add a document for quant... passed after 1 hr 25 min

jenkins-droid · 2025-01-20T05:15:08Z

Jenkins Linux s390x Build #16138 [push] Add a document for quant... passed after 1 hr 27 min

jenkins-droid · 2025-01-20T06:12:56Z

Jenkins Linux ppc64le Build #15164 [push] Add a document for quant... passed after 2 hr 26 min

tungld added 2 commits January 15, 2025 23:56

Add a document for quantization on NNPA

d728dec

Signed-off-by: Tung D. Le <[email protected]>

Edit

4ceecf4

Signed-off-by: Tung D. Le <[email protected]>

tungld requested review from AlexandreEichenberger and Sunny-Anand January 16, 2025 12:41

Sunny-Anand approved these changes Jan 16, 2025

View reviewed changes

AlexandreEichenberger reviewed Jan 16, 2025

View reviewed changes

Address comments

d8157f7

Signed-off-by: Tung D. Le <[email protected]>

Sunny-Anand approved these changes Jan 17, 2025

View reviewed changes

AlexandreEichenberger reviewed Jan 17, 2025

View reviewed changes

tungld added 2 commits January 19, 2025 19:47

Merge branch 'main' into nnpa_quant_document_pr

cf7c200

Update according to comments

c0f2de0

Signed-off-by: Tung D. Le <[email protected]>

tungld merged commit bd41f89 into onnx:main Jan 20, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a document for quantization on NNPA #3045

Add a document for quantization on NNPA #3045

tungld commented Jan 16, 2025 •

edited

Loading

tungld commented Jan 16, 2025

Sunny-Anand left a comment

Sunny-Anand Jan 16, 2025

AlexandreEichenberger Jan 16, 2025

Sunny-Anand Jan 16, 2025

AlexandreEichenberger left a comment

AlexandreEichenberger Jan 16, 2025

tungld commented Jan 17, 2025

Sunny-Anand left a comment

AlexandreEichenberger Jan 17, 2025 •

edited

Loading

tungld Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025


		# Performance notes

		It is often the case that symmetric quantization leads to better inference performance but poorer accuracy than asymetric quantization.


		# Overview

		NNPA in IBM Telum II supports 8-bit signed-integer quantized matrix multiplications. This document shows how to compile an ONNX model for quantization on NNPA.

Add a document for quantization on NNPA #3045

Add a document for quantization on NNPA #3045

Conversation

tungld commented Jan 16, 2025 • edited Loading

tungld commented Jan 16, 2025

Sunny-Anand left a comment

Choose a reason for hiding this comment

Sunny-Anand Jan 16, 2025

Choose a reason for hiding this comment

AlexandreEichenberger Jan 16, 2025

Choose a reason for hiding this comment

Sunny-Anand Jan 16, 2025

Choose a reason for hiding this comment

AlexandreEichenberger left a comment

Choose a reason for hiding this comment

AlexandreEichenberger Jan 16, 2025

Choose a reason for hiding this comment

tungld commented Jan 17, 2025

Sunny-Anand left a comment

Choose a reason for hiding this comment

AlexandreEichenberger Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

tungld Jan 20, 2025

Choose a reason for hiding this comment

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

jenkins-droid commented Jan 20, 2025

tungld commented Jan 16, 2025 •

edited

Loading

AlexandreEichenberger Jan 17, 2025 •

edited

Loading