Skip to content

Commit

Permalink
Update quantization.md
Browse files Browse the repository at this point in the history
Signed-off-by: Wang, Mengni <[email protected]>
  • Loading branch information
mengniwang95 authored Jul 25, 2024
1 parent 4df230e commit 9197311
Showing 1 changed file with 0 additions and 8 deletions.
8 changes: 0 additions & 8 deletions docs/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,6 @@ Sometimes the reduce_range feature, that's using 7 bit width (1 sign bit + 6 dat
> Activation (uint8) + Weight (int8) is recommended for performance on x86-64 machines with AVX2 and AVX512 extensions.

#### Quantization Scheme
+ Symmetric Quantization
+ int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1); zero_point = 0
+ uint8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(uint8) - min(uint8)); zero_point = 0
+ Asymmetric Quantization
+ int8: scale = (rmax - rmin) / (max(int8) - min(int8)); zero_point = round(min(int8) - rmin / scale)
+ uint8: scale = (rmax - rmin) / (max(uint8) - min(uint8)); zero_point = round(min(uint8) - rmin / scale)

#### Reference
+ MLAS: [MLAS Quantization](https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/quantization/onnx_quantizer.py)

Expand Down

0 comments on commit 9197311

Please sign in to comment.