You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tutorials/releases.md
+17-16
Original file line number
Diff line number
Diff line change
@@ -9,11 +9,11 @@ Highlights include:
9
9
- Automatic INT8 quantization became a stable feature baking into a well-tuned default quantization recipe, supporting both static and dynamic quantization and a wide range of calibration algorithms.
10
10
- Runtime Extension, featured MultiStreamModule, became a stable feature, could further enhance throughput in offline inference scenario.
11
11
- More optimizations in graph and operations to improve performance of broad set of models, examples include but not limited to wave2vec, T5, Albert etc.
12
-
-We provide a pre-built experimental binary with oneDNN Graph Compiler tuned on, which would deliver additional performance gain for Bert, Albert, Roberta in INT8 inference.
12
+
-Pre-built experimental binary with oneDNN Graph Compiler tuned on would deliver additional performance gain for Bert, Albert, Roberta in INT8 inference.
13
13
14
14
### Highlights
15
15
16
-
-Finalize the quantization feature from experimental to golden. We facilitate the user experience to be fully compatible with that of PyTorch. In this release, the extension supports PyTorch calibration algorithm directly and uses `Histogram` for calibration by default. Comparing to the previous version, changes are listed as following. Refer to [tutorial page](features/int8.md) for more details.
16
+
-Matured automatic INT8 quantization feature baking into a well-tuned default quantization recipe. We facilitated the user experience and provided a wide range of calibration algorithms like Histogram, MinMax, MovingAverageMinMax, etc. Meanwhile, We polished the static quantization with better flexibility and enabled dynamic quantization as well. Compared to the previous version, the brief changes are as follows. Refer to [tutorial page](features/int8.md) for more details.
17
17
18
18
<tablealign="center">
19
19
<tbody>
@@ -64,7 +64,7 @@ Highlights include:
64
64
</tbody>
65
65
</table>
66
66
67
-
-Improve runtime performance and user experience. In this release, we enhance the heuristic rule to make the Runtime Extension feature benefit OOB models in most situations. Meanwhile, we also provide `ipex.cpu.runtime.MultiStreamModuleHint` to customize how to distribute input into streams and then concatenate outputs from each steam.
67
+
-Runtime Extension, featured MultiStreamModule, became a stable feature. In this release, we enhanced the heuristic rule to further enhance throughput in offline inference scenario. Meanwhile, we also provide the `ipex.cpu.runtime.MultiStreamModuleHint` to custom how to split the input into streams and concat the output for each steam.
68
68
69
69
<tablealign="center">
70
70
<tbody>
@@ -104,7 +104,7 @@ Highlights include:
104
104
</tbody>
105
105
</table>
106
106
107
-
-Polish `ipex.optimize` to take input shape information. With additional shape information, it is possible to choose the optimal memory layout to improve kernel efficiency.
107
+
-Polished the `ipex.optimize` to accept the input shape information which would conclude the optimal memory layout for better kernel efficiency.
108
108
109
109
<tablealign="center">
110
110
<tbody>
@@ -139,19 +139,20 @@ Highlights include:
139
139
</tbody>
140
140
</table>
141
141
142
-
- Fuse Adam to improve training performance [#822](https://github.com/intel/intel-extension-for-pytorch/commit/d3f714e54dc8946675259ea7a445b26a2460b523)
143
-
- Support Deconv3D to serve most models like xxx and implement most fusions like Conv
144
-
- Enable LSTM to support static and dynamic quantization [#692](https://github.com/intel/intel-extension-for-pytorch/commit/2bf8dba0c380a26bbb385e253adbfaa2a033a785)
145
-
- Enable Linear to support dynamic quantization [#787](https://github.com/intel/intel-extension-for-pytorch/commit/ff231fb55e33c37126a0ef7f0e739cd750d1ef6c)
146
-
- Add more optimizations, including more custom operators and fusions.
147
-
- Fuse `Add` + `Swish` to accelerate FSI Riskful model [#551](https://github.com/intel/intel-extension-for-pytorch/commit/cc855ff2bafd245413a6111f3d21244d0bcbb6f6)
- Provided more optimizations in graph and operations
143
+
- Fuse Adam to improve training performance [#822](https://github.com/intel/intel-extension-for-pytorch/commit/d3f714e54dc8946675259ea7a445b26a2460b523)
150
144
- Enable Normalization operators to support channels-last 3D [#642](https://github.com/intel/intel-extension-for-pytorch/commit/ae268ac1760d598a29584de5c99bfba46c6554ae)
- Optimize `Convolution1D` to support channels last memory layout and fuse `GeLU` as its post operation. [#657](https://github.com/intel/intel-extension-for-pytorch/commit/a0c063bdf4fd1a7e66f8a23750ac0c2fe471a559)
153
-
- Fuse `Einsum` + `Add` to boost Alphafold2 [#674](https://github.com/intel/intel-extension-for-pytorch/commit/3094f346a67c81ad858ad2a80900fab4c3b4f4e9)
- Support Deconv3D to serve most models and implement most fusions like Conv
146
+
- Enable LSTM to support static and dynamic quantization [#692](https://github.com/intel/intel-extension-for-pytorch/commit/2bf8dba0c380a26bbb385e253adbfaa2a033a785)
147
+
- Enable Linear to support dynamic quantization [#787](https://github.com/intel/intel-extension-for-pytorch/commit/ff231fb55e33c37126a0ef7f0e739cd750d1ef6c)
148
+
- Fusions.
149
+
- Fuse `Add` + `Swish` to accelerate FSI Riskful model [#551](https://github.com/intel/intel-extension-for-pytorch/commit/cc855ff2bafd245413a6111f3d21244d0bcbb6f6)
- Optimize `Convolution1D` to support channels last memory layout and fuse `GeLU` as its post operation. [#657](https://github.com/intel/intel-extension-for-pytorch/commit/a0c063bdf4fd1a7e66f8a23750ac0c2fe471a559)
154
+
- Fuse `Einsum` + `Add` to boost Alphafold2 [#674](https://github.com/intel/intel-extension-for-pytorch/commit/3094f346a67c81ad858ad2a80900fab4c3b4f4e9)
-`RuntimeError: Overflow when unpacking long` when a tensor's min max value exceeds int range while performing int8 calibration. Please customize QConfig to use min-max calibration method.
0 commit comments