Skip to content

Commit

Permalink
Update quantization.md
Browse files Browse the repository at this point in the history
Signed-off-by: Wang, Mengni <[email protected]>
  • Loading branch information
mengniwang95 authored Jul 25, 2024
1 parent 34790d1 commit 4df230e
Showing 1 changed file with 5 additions and 7 deletions.
12 changes: 5 additions & 7 deletions docs/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ Quantization

3.2 [Specify Quantization Rules](#specify-quantization-rules)

3.3 [Specify Quantization Recipes](#specify-quantization-recipes)

3.4 [Specify Quantization Backend and Device](#specify-quantization-backend-and-device)
3.3 [Specify Quantization Backend and Device](#specify-quantization-backend-and-device)
4. [Examples](#examples)

## Quantization Introduction
Expand All @@ -30,11 +28,11 @@ This is so-called `asymmetric quantization`, in which we map the min/max range i

here:

If INT8 is specified, $Scale = (|X_{f_{max}} - X_{f_{min}}|) / 127$ and $ZeroPoint = -128 - X_{f_{min}} / Scale$.
If INT8 is specified, $Scale = (|X_{max} - X_{min}|) / 127$ and $ZeroPoint = -128 - X_{min} / Scale$.

or

If UINT8 is specified, $Scale = (|X_{f_{max}} - X_{f_{min}}|) / 255$ and $ZeroPoint = - X_{f_{min}} / Scale$.
If UINT8 is specified, $Scale = (|X_{max} - X_{min}|) / 255$ and $ZeroPoint = - X_{min} / Scale$.

**Scale Quantization**

Expand All @@ -44,11 +42,11 @@ The math equation is like:

here:

If INT8 is specified, $Scale = max(abs(X_{f_{max}}), abs(X_{f_{min}})) / 127$ and $ZeroPoint = 0$.
If INT8 is specified, $Scale = max(abs(X_{max}), abs(X_{min})) / 127$ and $ZeroPoint = 0$.

or

If UINT8 is specified, $Scale = max(abs(X_{f_{max}}), abs(X_{f_{min}})) / 255$ and $ZeroPoint = 128$.
If UINT8 is specified, $Scale = max(abs(X_{max}), abs(X_{min})) / 255$ and $ZeroPoint = 128$.

*NOTE*

Expand Down

0 comments on commit 4df230e

Please sign in to comment.