Skip to content

Commit bd7349a

Browse files
authored
update docs for 1.11 release (#618)
1 parent c5407bf commit bd7349a

File tree

5 files changed

+102
-32
lines changed

5 files changed

+102
-32
lines changed

README.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,18 @@ More detailed tutorials are available at [**Intel® Extension for PyTorch\* onli
88

99
## Installation
1010

11-
Wheel files are avaiable for the following Python versions.
11+
You can use either of the following 2 commands to install Intel® Extension for PyTorch\*.
1212

13-
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 |
14-
| :--: | :--: | :--: | :--: | :--: |
15-
| 1.10.100 | ✔️ | ✔️ | ✔️ | ✔️ |
13+
```python
14+
python -m pip install intel_extension_for_pytorch
15+
```
1616

1717
```python
18-
python -m pip install torch==1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
19-
python -m pip install intel_extension_for_pytorch==1.10.100 -f https://software.intel.com/ipex-whl-stable
20-
python -m pip install psutil
18+
python -m pip install intel_extension_for_pytorch -f https://software.intel.com/ipex-whl-stable
2119
```
2220

21+
**Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check more detailed information via the URL below.
22+
2323
More installation methods can be found at [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/tutorials/installation.html)
2424

2525
## Getting Started

docs/tutorials/features.rst

+2
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ Low precision data type BFloat16 has been natively supported on the 3rd Generati
4444

4545
Check more detailed information for `Auto Mixed Precision (AMP) <features/amp.html>`_.
4646

47+
Bfloat16 computation can be conducted on platforms with AVX512 instruction set. On platforms with `AVX512 BFloat16 instruction <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-deep-learning-boost-new-instruction-bfloat16.html>`_, there will be further performance boost.
48+
4749
.. toctree::
4850
:hidden:
4951
:maxdepth: 1

docs/tutorials/installation.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ Prebuilt wheel files availability matrix for Python versions
4949
| 1.9.0 | ✔️ | ✔️ | ✔️ | ✔️ | |
5050
| 1.8.0 | | ✔️ | | | |
5151

52+
**Note:** Intel® Extension for PyTorch\* has PyTorch version requirement. Please check the mapping table above.
53+
5254
Starting from 1.11.0, you can use normal pip command to install the package.
5355

5456
```
@@ -63,7 +65,7 @@ python -m pip install intel_extension_for_pytorch -f https://software.intel.com/
6365

6466
**Note:** For version prior to 1.10.0, please use package name `torch_ipex`, rather than `intel_extension_for_pytorch`.
6567

66-
**Note:** To install a package with a specific version, please use the standard way of pip.
68+
**Note:** To install a package with a specific version, please run with the following command.
6769

6870
```
6971
python -m pip install <package_name>==<version_name> -f https://software.intel.com/ipex-whl-stable

docs/tutorials/performance_tuning/known_issues.md

+5-21
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,15 @@
11
Known Issues
22
============
33

4-
- BFloat16 is currently only supported natively on platforms with the following instruction set. The support will be expanded gradually to more platforms in furture releases.
4+
- BF16 AMP(auto-mixed-precision) runs abnormally with the extension on the AVX2-only machine if the topology contains `Conv`, `Matmul`, `Linear`, and `BatchNormalization`
55

6-
| Instruction Set | Description |
7-
| --- | --- |
8-
| AVX512\_CORE | Intel AVX-512 with AVX512BW, AVX512VL, and AVX512DQ extensions |
9-
| AVX512\_CORE\_VNNI | Intel AVX-512 with Intel DL Boost |
10-
| AVX512\_CORE\_BF16 | Intel AVX-512 with Intel DL Boost and bfloat16 support |
11-
| AVX512\_CORE\_AMX | Intel AVX-512 with Intel DL Boost and bfloat16 support and Intel Advanced Matrix Extensions (Intel AMX) with 8-bit integer and bfloat16 support |
6+
- Runtime extension does not support the scenario that the BS is not divisible by the stream number
127

13-
- INT8 performance of EfficientNet and DenseNet with Intel® Extension for PyTorch\* is slower than that of FP32
8+
- Incorrect Conv and Linear result if the number of OMP threads is changed at runtime
149

15-
- `omp_set_num_threads` function failed to change OpenMP threads number of oneDNN operators if it was set before.
10+
The oneDNN memory layout depends on the number of OMP threads, which requires the caller to detect the changes for the # of OMP threads while this release has not implemented it yet.
1611

17-
`omp_set_num_threads` function is provided in Intel® Extension for PyTorch\* to change number of threads used with openmp. However, it failed to change number of OpenMP threads if it was set before.
18-
19-
pseudo code:
20-
21-
```
22-
omp_set_num_threads(6)
23-
model_execution()
24-
omp_set_num_threads(4)
25-
same_model_execution_again()
26-
```
27-
28-
**Reason:** oneDNN primitive descriptor stores the omp number of threads. Current oneDNN integration caches the primitive descriptor in IPEX. So if we use runtime extension with oneDNN based pytorch/ipex operation, the runtime extension fails to change the used omp number of threads.
12+
- INT8 performance of EfficientNet and DenseNet with Intel® Extension for PyTorch\* is slower than that of FP32
2913

3014
- Low performance with INT8 support for dynamic shapes
3115

docs/tutorials/releases.md

+85-3
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,95 @@
11
Releases
22
=============
33

4+
## 1.11.0
5+
6+
We are excited to announce Intel® Extension for PyTorch\* 1.11.0-cpu release by tightly following PyTorch 1.11 release. Along with extension 1.11, we focused on continually improving OOB user experience and performance. Highlights include:
7+
8+
* Support a single binary with runtime dynamic dispatch based on AVX2/AVX512 hardware ISA detection
9+
* Support install binary from `pip` with package name only (without the need of specifying the URL)
10+
* Provide the C++ SDK installation to facilitate ease of C++ app development and deployment
11+
* Add more optimizations, including graph fusions for speeding up Transformer-based models and CNN, etc
12+
* Reduce the binary size for both the PIP wheel and C++ SDK (2X to 5X reduction from the previous version)
13+
14+
### Highlights
15+
- Combine the AVX2 and AVX512 binary as a single binary and automatically dispatch to different implementations based on hardware ISA detection at runtime. The typical case is to serve the data center that mixtures AVX2-only and AVX512 platforms. It does not need to deploy the different ISA binary now compared to the previous version
16+
17+
***NOTE***: The extension uses the oneDNN library as the backend. However, the BF16 and INT8 operator sets and features are different between AVX2 and AVX512. Please refer to [oneDNN document](https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#processors-with-the-intel-avx2-or-intel-avx-512-support) for more details.
18+
19+
> When one input is of type u8, and the other one is of type s8, oneDNN assumes that it is the user’s responsibility to choose the quantization parameters so that no overflow/saturation occurs. For instance, a user can use u7 [0, 127] instead of u8 for the unsigned input, or s7 [-64, 63] instead of the s8 one. It is worth mentioning that this is required only when the Intel AVX2 or Intel AVX512 Instruction Set is used.
20+
21+
- The extension wheel packages have been uploaded to [pypi.org](https://pypi.org/project/intel-extension-for-pytorch/). The user could directly install the extension by `pip/pip3` without explicitly specifying the binary location URL.
22+
23+
<table align="center">
24+
<tbody>
25+
<tr>
26+
<td>v1.10.100-cpu</td>
27+
<td>v1.11.0-cpu</td>
28+
</tr>
29+
<tr>
30+
<td>
31+
32+
```python
33+
python -m pip install intel_extension_for_pytorch==1.10.100 -f https://software.intel.com/ipex-whl-stable
34+
```
35+
</td>
36+
<td>
37+
38+
```python
39+
pip install intel_extension_for_pytorch
40+
```
41+
</td>
42+
</tr>
43+
</tbody>
44+
</table>
45+
46+
- Compared to the previous version, this release provides a dedicated installation file for the C++ SDK. The installation file automatically detects the PyTorch C++ SDK location and installs the extension C++ SDK files to the PyTorch C++ SDK. The user does not need to manually add the extension C++ SDK source files and CMake to the PyTorch SDK. In addition to that, the installation file reduces the C++ SDK binary size from ~220MB to ~13.5MB.
47+
48+
<table align="center">
49+
<tbody>
50+
<tr>
51+
<td>v1.10.100-cpu</td>
52+
<td>v1.11.0-cpu</td>
53+
</tr>
54+
<tr>
55+
<td>
56+
57+
```python
58+
intel-ext-pt-cpu-libtorch-shared-with-deps-1.10.0+cpu.zip (220M)
59+
intel-ext-pt-cpu-libtorch-cxx11-abi-shared-with-deps-1.10.0+cpu.zip (224M)
60+
```
61+
</td>
62+
<td>
63+
64+
```python
65+
libintel-ext-pt-1.11.0+cpu.run (13.7M)
66+
libintel-ext-pt-cxx11-abi-1.11.0+cpu.run (13.5M)
67+
```
68+
</td>
69+
</tr>
70+
</tbody>
71+
</table>
72+
73+
- Add more optimizations, including more custom operators and fusions.
74+
- Fuse the QKV linear operators as a single Linear to accelerate the Transformer\*(BERT-\*) encoder part - [#278](https://github.com/intel/intel-extension-for-pytorch/commit/0f27c269cae0f902973412dc39c9a7aae940e07b).
75+
- Remove Multi-Head-Attention fusion limitations to support the 64bytes unaligned tensor shape. [#531](https://github.com/intel/intel-extension-for-pytorch/commit/dbb10fedb00c6ead0f5b48252146ae9d005a0fad)
76+
- Fold the binary operator to Convolution and Linear operator to reduce computation. [#432](https://github.com/intel/intel-extension-for-pytorch/commit/564588561fa5d45b8b63e490336d151ff1fc9cbc) [#438](https://github.com/intel/intel-extension-for-pytorch/commit/b4e7dacf08acd849cecf8d143a11dc4581a3857f) [#602](https://github.com/intel/intel-extension-for-pytorch/commit/74aa21262938b923d3ed1e6929e7d2b629b3ff27)
77+
- Replace the outplace operators with their corresponding in-place version to reduce memory footprint. The extension currently supports the operators including `sliu`, `sigmoid`, `tanh`, `hardsigmoid`, `hardswish`, `relu6`, `relu`, `selu`, `softmax`. [#524](https://github.com/intel/intel-extension-for-pytorch/commit/38647677e8186a235769ea519f4db65925eca33c)
78+
- Fuse the Concat + BN + ReLU as a single operator. [#452](https://github.com/intel/intel-extension-for-pytorch/commit/275ff503aea780a6b741f04db5323d9529ee1081)
79+
- Optimize Conv3D for both imperative and JIT by enabling NHWC and pre-packing the weight. [#425](https://github.com/intel/intel-extension-for-pytorch/commit/ae33faf62bb63b204b0ee63acb8e29e24f6076f3)
80+
- Reduce the binary size. C++ SDK is reduced from ~220MB to ~13.5MB while the wheel packaged is reduced from ~100MB to ~40MB.
81+
- Update oneDNN and oneDNN graph to [2.5.2](https://github.com/oneapi-src/oneDNN/releases/tag/v2.5.2) and [0.4.2](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.2) respectively.
82+
83+
### What's Changed
84+
**Full Changelog**: https://github.com/intel/intel-extension-for-pytorch/compare/v1.10.100...v1.11.0
85+
486
## 1.10.100
587

688
This release is meant to fix the following issues:
789
- Resolve the issue that the PyTorch Tensor Expression(TE) did not work after importing the extension.
8-
- Wraps the BactchNorm(BN) as another operator to break the TE's BN-related fusions. Because the BatchNorm performance of PyTorch Tensor Expression can not achieve the same performance as PyTorch ATen BN.
90+
- Wraps the BactchNorm(BN) as another operator to break the TE's BN-related fusions. Because the BatchNorm performance of PyTorch Tensor Expression can not achieve the same performance as PyTorch ATen BN.
991
- Update the [documentation](https://intel.github.io/intel-extension-for-pytorch/)
10-
- Fix the INT8 quantization example issue #205
92+
- Fix the INT8 quantization example issue #205
1193
- Polish the installation guide
1294

1395
## 1.10.0
@@ -149,7 +231,7 @@ class MyModel(nn.Module):
149231
def __init__(self):
150232
super(MyModel, self).__init__()
151233
self.conv = nn.Conv2d(10, 10, 3)
152-
234+
153235
def forward(self, x):
154236
x = self.conv(x)
155237
return x

0 commit comments

Comments
 (0)