|
1 | 1 | Releases
|
2 | 2 | =============
|
3 | 3 |
|
| 4 | +## 1.11.0 |
| 5 | + |
| 6 | +We are excited to announce Intel® Extension for PyTorch\* 1.11.0-cpu release by tightly following PyTorch 1.11 release. Along with extension 1.11, we focused on continually improving OOB user experience and performance. Highlights include: |
| 7 | + |
| 8 | +* Support a single binary with runtime dynamic dispatch based on AVX2/AVX512 hardware ISA detection |
| 9 | +* Support install binary from `pip` with package name only (without the need of specifying the URL) |
| 10 | +* Provide the C++ SDK installation to facilitate ease of C++ app development and deployment |
| 11 | +* Add more optimizations, including graph fusions for speeding up Transformer-based models and CNN, etc |
| 12 | +* Reduce the binary size for both the PIP wheel and C++ SDK (2X to 5X reduction from the previous version) |
| 13 | + |
| 14 | +### Highlights |
| 15 | +- Combine the AVX2 and AVX512 binary as a single binary and automatically dispatch to different implementations based on hardware ISA detection at runtime. The typical case is to serve the data center that mixtures AVX2-only and AVX512 platforms. It does not need to deploy the different ISA binary now compared to the previous version |
| 16 | + |
| 17 | + ***NOTE***: The extension uses the oneDNN library as the backend. However, the BF16 and INT8 operator sets and features are different between AVX2 and AVX512. Please refer to [oneDNN document](https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#processors-with-the-intel-avx2-or-intel-avx-512-support) for more details. |
| 18 | + |
| 19 | + > When one input is of type u8, and the other one is of type s8, oneDNN assumes that it is the user’s responsibility to choose the quantization parameters so that no overflow/saturation occurs. For instance, a user can use u7 [0, 127] instead of u8 for the unsigned input, or s7 [-64, 63] instead of the s8 one. It is worth mentioning that this is required only when the Intel AVX2 or Intel AVX512 Instruction Set is used. |
| 20 | +
|
| 21 | +- The extension wheel packages have been uploaded to [pypi.org](https://pypi.org/project/intel-extension-for-pytorch/). The user could directly install the extension by `pip/pip3` without explicitly specifying the binary location URL. |
| 22 | + |
| 23 | +<table align="center"> |
| 24 | +<tbody> |
| 25 | +<tr> |
| 26 | +<td>v1.10.100-cpu</td> |
| 27 | +<td>v1.11.0-cpu</td> |
| 28 | +</tr> |
| 29 | +<tr> |
| 30 | +<td> |
| 31 | + |
| 32 | +```python |
| 33 | +python -m pip install intel_extension_for_pytorch==1.10.100 -f https://software.intel.com/ipex-whl-stable |
| 34 | +``` |
| 35 | +</td> |
| 36 | +<td> |
| 37 | + |
| 38 | +```python |
| 39 | +pip install intel_extension_for_pytorch |
| 40 | +``` |
| 41 | +</td> |
| 42 | +</tr> |
| 43 | +</tbody> |
| 44 | +</table> |
| 45 | + |
| 46 | +- Compared to the previous version, this release provides a dedicated installation file for the C++ SDK. The installation file automatically detects the PyTorch C++ SDK location and installs the extension C++ SDK files to the PyTorch C++ SDK. The user does not need to manually add the extension C++ SDK source files and CMake to the PyTorch SDK. In addition to that, the installation file reduces the C++ SDK binary size from ~220MB to ~13.5MB. |
| 47 | + |
| 48 | +<table align="center"> |
| 49 | +<tbody> |
| 50 | +<tr> |
| 51 | +<td>v1.10.100-cpu</td> |
| 52 | +<td>v1.11.0-cpu</td> |
| 53 | +</tr> |
| 54 | +<tr> |
| 55 | +<td> |
| 56 | + |
| 57 | +```python |
| 58 | +intel-ext-pt-cpu-libtorch-shared-with-deps-1.10.0+cpu.zip (220M) |
| 59 | +intel-ext-pt-cpu-libtorch-cxx11-abi-shared-with-deps-1.10.0+cpu.zip (224M) |
| 60 | +``` |
| 61 | +</td> |
| 62 | +<td> |
| 63 | + |
| 64 | +```python |
| 65 | +libintel-ext-pt-1.11.0+cpu.run (13.7M) |
| 66 | +libintel-ext-pt-cxx11-abi-1.11.0+cpu.run (13.5M) |
| 67 | +``` |
| 68 | +</td> |
| 69 | +</tr> |
| 70 | +</tbody> |
| 71 | +</table> |
| 72 | + |
| 73 | +- Add more optimizations, including more custom operators and fusions. |
| 74 | + - Fuse the QKV linear operators as a single Linear to accelerate the Transformer\*(BERT-\*) encoder part - [#278](https://github.com/intel/intel-extension-for-pytorch/commit/0f27c269cae0f902973412dc39c9a7aae940e07b). |
| 75 | + - Remove Multi-Head-Attention fusion limitations to support the 64bytes unaligned tensor shape. [#531](https://github.com/intel/intel-extension-for-pytorch/commit/dbb10fedb00c6ead0f5b48252146ae9d005a0fad) |
| 76 | + - Fold the binary operator to Convolution and Linear operator to reduce computation. [#432](https://github.com/intel/intel-extension-for-pytorch/commit/564588561fa5d45b8b63e490336d151ff1fc9cbc) [#438](https://github.com/intel/intel-extension-for-pytorch/commit/b4e7dacf08acd849cecf8d143a11dc4581a3857f) [#602](https://github.com/intel/intel-extension-for-pytorch/commit/74aa21262938b923d3ed1e6929e7d2b629b3ff27) |
| 77 | + - Replace the outplace operators with their corresponding in-place version to reduce memory footprint. The extension currently supports the operators including `sliu`, `sigmoid`, `tanh`, `hardsigmoid`, `hardswish`, `relu6`, `relu`, `selu`, `softmax`. [#524](https://github.com/intel/intel-extension-for-pytorch/commit/38647677e8186a235769ea519f4db65925eca33c) |
| 78 | + - Fuse the Concat + BN + ReLU as a single operator. [#452](https://github.com/intel/intel-extension-for-pytorch/commit/275ff503aea780a6b741f04db5323d9529ee1081) |
| 79 | + - Optimize Conv3D for both imperative and JIT by enabling NHWC and pre-packing the weight. [#425](https://github.com/intel/intel-extension-for-pytorch/commit/ae33faf62bb63b204b0ee63acb8e29e24f6076f3) |
| 80 | +- Reduce the binary size. C++ SDK is reduced from ~220MB to ~13.5MB while the wheel packaged is reduced from ~100MB to ~40MB. |
| 81 | +- Update oneDNN and oneDNN graph to [2.5.2](https://github.com/oneapi-src/oneDNN/releases/tag/v2.5.2) and [0.4.2](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.2) respectively. |
| 82 | + |
| 83 | +### What's Changed |
| 84 | +**Full Changelog**: https://github.com/intel/intel-extension-for-pytorch/compare/v1.10.100...v1.11.0 |
| 85 | + |
4 | 86 | ## 1.10.100
|
5 | 87 |
|
6 | 88 | This release is meant to fix the following issues:
|
7 | 89 | - Resolve the issue that the PyTorch Tensor Expression(TE) did not work after importing the extension.
|
8 |
| -- Wraps the BactchNorm(BN) as another operator to break the TE's BN-related fusions. Because the BatchNorm performance of PyTorch Tensor Expression can not achieve the same performance as PyTorch ATen BN. |
| 90 | +- Wraps the BactchNorm(BN) as another operator to break the TE's BN-related fusions. Because the BatchNorm performance of PyTorch Tensor Expression can not achieve the same performance as PyTorch ATen BN. |
9 | 91 | - Update the [documentation](https://intel.github.io/intel-extension-for-pytorch/)
|
10 |
| - - Fix the INT8 quantization example issue #205 |
| 92 | + - Fix the INT8 quantization example issue #205 |
11 | 93 | - Polish the installation guide
|
12 | 94 |
|
13 | 95 | ## 1.10.0
|
@@ -149,7 +231,7 @@ class MyModel(nn.Module):
|
149 | 231 | def __init__(self):
|
150 | 232 | super(MyModel, self).__init__()
|
151 | 233 | self.conv = nn.Conv2d(10, 10, 3)
|
152 |
| - |
| 234 | +
|
153 | 235 | def forward(self, x):
|
154 | 236 | x = self.conv(x)
|
155 | 237 | return x
|
|
0 commit comments