Skip to content

Commit 4dabfa7

Browse files
authored
add new feature docs (#1286)
add 1.13 new feature docs add hypertune strategy img update to intel_extension_for_pytorch.cpu.hypertune modify the docs of graph capture and recipe tuning modify recipe tuning doc fix typo in ipex.quantization.convert api doc update installation index html url and add link for ipex xpu align release notes to 1.12.300 change auto_opt into auto_ipex remove the name of IPEX adjust xpu hp url and int8 feature docs url list new feature abstracts to features.rst add keywords of gpu for seo fine tune fine tune fine tune fine tune
1 parent 2bbce91 commit 4dabfa7

21 files changed

+807
-65
lines changed

docker/Dockerfile.conda

+4-4
Original file line numberDiff line numberDiff line change
@@ -41,10 +41,10 @@ RUN curl -fsSL -v -o ~/miniconda.sh -O https://repo.anaconda.com/miniconda/Mini
4141
/opt/conda/bin/conda clean -ya
4242

4343
FROM dev-base AS build
44-
ARG IPEX_VERSION=v1.12.100
45-
ARG PYTORCH_VERSION=v1.12.0
46-
ARG TORCHVISION_VERSION=0.13.0+cpu
47-
ARG TORCHAUDIO_VERSION=0.12.0+cpu
44+
ARG IPEX_VERSION=v1.13.0
45+
ARG PYTORCH_VERSION=v1.13.0
46+
ARG TORCHVISION_VERSION=0.14.0+cpu
47+
ARG TORCHAUDIO_VERSION=0.13.0+cpu
4848
COPY --from=conda /opt/conda /opt/conda
4949
RUN --mount=type=cache,target=/opt/ccache \
5050
python -m pip install --no-cache-dir torch==${PYTORCH_VERSION}+cpu torchvision==${TORCHVISION_VERSION} torchaudio==${TORCHAUDIO_VERSION} -f https://download.pytorch.org/whl/torch_stable.html && \

docker/Dockerfile.pip

+4-4
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ RUN ${PYTHON} -m pip --no-cache-dir install --upgrade \
3030
# Some TF tools expect a "python" binary
3131
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
3232

33-
ARG IPEX_VERSION=1.12.100
34-
ARG PYTORCH_VERSION=1.12.0+cpu
35-
ARG TORCHAUDIO_VERSION=0.12.0
36-
ARG TORCHVISION_VERSION=0.13.0+cpu
33+
ARG IPEX_VERSION=1.13.0
34+
ARG PYTORCH_VERSION=1.13.0+cpu
35+
ARG TORCHAUDIO_VERSION=0.13.0
36+
ARG TORCHVISION_VERSION=0.14.0+cpu
3737
ARG TORCH_CPU_URL=https://download.pytorch.org/whl/cpu/torch_stable.html
3838
ARG IPEX_URL=https://software.intel.com/ipex-whl-stable
3939

docs/index.rst

+19-6
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. meta::
22
:description: This website introduces Intel® Extension for PyTorch*
3-
:keywords: Intel optimization, PyTorch, Intel® Extension for PyTorch*
3+
:keywords: Intel optimization, PyTorch, Intel® Extension for PyTorch*, GPU, discrete GPU, Intel discrete GPU
44

55
Welcome to Intel® Extension for PyTorch* Documentation
66
######################################################
@@ -11,19 +11,32 @@ Intel® Extension for PyTorch* provides optimizations for both eager mode and gr
1111

1212
The extension can be loaded as a Python module for Python programs or linked as a C++ library for C++ programs. In Python scripts users can enable it dynamically by importing `intel_extension_for_pytorch`.
1313

14-
**Note**: Check `here <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/>`_ for detailed tutorials of Intel® Extension for PyTorch* for Intel® GPUs. Source code are available at the `xpu-master branch <https://github.com/intel/intel-extension-for-pytorch/tree/xpu-master>`_.
14+
-------------------------------------
1515

1616
Intel® Extension for PyTorch* for CPU is structured as shown in the following figure:
1717

18-
.. figure:: ../images/intel_extension_for_pytorch_structure.png
18+
.. figure:: ../images/intel_extension_for_pytorch_structure_cpu.png
1919
:width: 800
2020
:align: center
21-
:alt: Structure of Intel® Extension for PyTorch*
21+
:alt: Structure of Intel® Extension for PyTorch* for CPU
2222

2323

24-
PyTorch components are depicted with white boxes while Intel Extensions are with blue boxes. Extra performance of the extension is delivered via both custom addons and overriding existing PyTorch components. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers and INT8 quantization API. Further performance boosting is available by converting the eager-mode model into graph mode via the extended graph fusion passes. Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available in Intel hardware, as much as possible. oneDNN library is used for computation intensive operations. Intel Extension for PyTorch runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing.
24+
PyTorch components are depicted with white boxes while Intel Extensions are with blue boxes. Extra performance of the extension is delivered via both custom addons and overriding existing PyTorch components. In eager mode, the PyTorch frontend is extended with custom Python modules (such as fusion modules), optimal optimizers and INT8 quantization API. Further performance boost is available by converting the eager-mode model into graph mode via the extended graph fusion passes. Intel® Extension for PyTorch* dispatches the operators into their underlying kernels automatically based on ISA that it detects and leverages vectorization and matrix acceleration units available in Intel hardware, as much as possible. oneDNN library is used for computation intensive operations. Intel Extension for PyTorch runtime extension brings better efficiency with finer-grained thread runtime control and weight sharing.
2525

26-
Intel® Extension for PyTorch* has been released as an open–source project at `Github <https://github.com/intel/intel-extension-for-pytorch>`_.
26+
Intel® Extension for PyTorch* for CPU has been released as an open–source project at `Github master branch <https://github.com/intel/intel-extension-for-pytorch/tree/master>`_. Check `CPU tutorial <https://intel.github.io/intel-extension-for-pytorch/cpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® CPUs.
27+
28+
-------------------------------------
29+
30+
Intel® Extension for PyTorch* for GPU is structured as shown in the following figure:
31+
32+
.. figure:: ../images/intel_extension_for_pytorch_structure_gpu.svg
33+
:width: 800
34+
:align: center
35+
:alt: Architecture of Intel® Extension for PyTorch* for GPU
36+
37+
Intel® Extension for PyTorch* for GPU utilizes the `DPC++ <https://github.com/intel/llvm#oneapi-dpc-compiler>`_ compiler that supports the latest `SYCL* <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html>`_ standard and also a number of extensions to the SYCL* standard, which can be found in the `sycl/doc/extensions <https://github.com/intel/llvm/tree/sycl/sycl/doc/extensions>`_ directory. Intel® Extension for PyTorch* also integrates `oneDNN <https://github.com/oneapi-src/oneDNN>`_ and `oneMKL <https://github.com/oneapi-src/oneMKL>`_ libraries and provides kernels based on that. The oneDNN library is used for computation intensive operations. The oneMKL library is used for fundamental mathematical operations.
38+
39+
Intel® Extension for PyTorch* for GPU has been released as an open–source project on `GitHub xpu-master branch <https://github.com/intel/intel-extension-for-pytorch/tree/xpu-master>`_. Check `GPU tutorial <https://intel.github.io/intel-extension-for-pytorch/xpu/latest/>`_ for detailed information of Intel® Extension for PyTorch* for Intel® GPUs.
2740

2841
.. toctree::
2942
:hidden:

docs/tutorials/api_doc.rst

+4
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ Quantization
1616
.. autofunction:: prepare
1717
.. autofunction:: convert
1818

19+
Experimental API, introduction is avaiable at `feature page <./features/int8_recipe_tuning_api.md>`_.
20+
21+
.. autofunction:: autotune
22+
1923
CPU Runtime
2024
***********
2125

docs/tutorials/examples.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
Examples
22
========
33

4+
**_NOTE:_** Check individual feature page for examples of feature usage. All features are listed in the [feature page](./features.rst).
5+
46
## Training
57

68
### Single-instance Training
@@ -611,4 +613,4 @@ $ ldd example-app
611613
612614
## Model Zoo
613615
614-
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.12-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r1.12-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
616+
Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/pytorch-r1.13-models). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/pytorch-r1.13-models/benchmarks#pytorch-use-cases). You can get performance benefits out-of-box by simply running scipts in the Model Zoo.

docs/tutorials/features.rst

+53-23
Original file line numberDiff line numberDiff line change
@@ -31,30 +31,18 @@ For more detailed information, check `ISA Dynamic Dispatching <features/isa_dyna
3131

3232
features/isa_dynamic_dispatch
3333

34-
Channels Last
35-
-------------
36-
37-
Compared with the default NCHW memory format, using channels_last (NHWC) memory format could further accelerate convolutional neural networks. In Intel® Extension for PyTorch\*, NHWC memory format has been enabled for most key CPU operators, though not all of them have been accepted and merged into the PyTorch master branch yet.
38-
39-
For more detailed information, check `Channels Last <features/nhwc.md>`_.
40-
41-
.. toctree::
42-
:hidden:
43-
:maxdepth: 1
44-
45-
features/nhwc
46-
4734
Auto Channels Last
4835
------------------
4936

50-
Intel® Extension for PyTorch* automatically converts the model to channels last memory format by default when users optimize their model with ``ipex.optimize(model)``.
37+
Comparing to the default NCHW memory format, using channels_last (NHWC) memory format could further accelerate convolutional neural networks. In Intel® Extension for PyTorch*, NHWC memory format has been enabled for most key CPU operators. More detailed information is available at `Channels Last <features/nhwc.md>`_.
5138

52-
For more detailed information, check `Auto Channels Last <features/auto_channels_last.md>`_.
39+
Intel® Extension for PyTorch* automatically converts a model to channels last memory format when users optimize the model with `ipex.optimize(model)`. With this feature users won't need to manually apply `model=model.to(memory_format=torch.channels_last)` any more. More detailed information is available at `Auto Channels Last <features/auto_channels_last.md>`_.
5340

5441
.. toctree::
5542
:hidden:
5643
:maxdepth: 1
5744

45+
features/nhwc
5846
features/auto_channels_last
5947

6048
Auto Mixed Precision (AMP)
@@ -89,18 +77,21 @@ Compared to eager mode, graph mode in PyTorch normally yields better performance
8977
Operator Optimization
9078
---------------------
9179

92-
Intel® Extension for PyTorch\* also optimizes operators and implements several customized operators for performance boosts. A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch\* via the ATen registration mechanism. Some customized operators are implemented for several popular topologies. For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch\* also optimized these customized operators.
80+
Intel® Extension for PyTorch* also optimizes operators and implements several customized operators for performance boosts. A few ATen operators are replaced by their optimized counterparts in Intel® Extension for PyTorch* via the ATen registration mechanism. Some customized operators are implemented for several popular topologies. For instance, ROIAlign and NMS are defined in Mask R-CNN. To improve performance of these topologies, Intel® Extension for PyTorch* also optimized these customized operators.
9381

9482
.. currentmodule:: intel_extension_for_pytorch.nn
9583
.. autoclass:: FrozenBatchNorm2d
9684

9785
.. currentmodule:: intel_extension_for_pytorch.nn.functional
9886
.. autofunction:: interaction
9987

88+
**Auto kernel selection** is a feature that enables users to tune for better performance with GEMM operations. It is provided as parameter –auto_kernel_selection, with boolean value, of the ipex.optimize() function. By default, the GEMM kernel is computed with oneMKL primitives. However, under certain circumstances oneDNN primitives run faster. Users are able to set –auto_kernel_selection to True to run GEMM kernels with oneDNN primitives.” -> "We aims to provide good default performance by leveraging the best of math libraries and enabled weights_prepack, and it has been verified with broad set of models. If you would like to try other alternatives, you can use auto_kernel_selection toggle in ipex.optimize to switch, and you can diesable weights_preack in ipex.optimize if you are concerning the memory footprint more than performance gain. However in majority cases, keeping default is what we recommend.
89+
10090
Optimizer Optimization
10191
----------------------
10292

103-
Optimizers are one of key parts of the training workloads. Intel® Extension for PyTorch\* brings two types of optimizations to optimizers:
93+
Optimizers are one of key parts of the training workloads. Intel® Extension for PyTorch* brings two types of optimizations to optimizers:
94+
10495
1. Operator fusion for the computation in the optimizers.
10596
2. SplitSGD for BF16 training, which reduces the memory footprint of the master weights by half.
10697

@@ -114,9 +105,8 @@ For more detailed information, check `Optimizer Fusion <features/optimizer_fusio
114105
features/optimizer_fusion
115106
features/split_sgd
116107

117-
118108
Runtime Extension
119-
--------------------------------
109+
-----------------
120110

121111
Intel® Extension for PyTorch* Runtime Extension provides PyTorch frontend APIs for users to get finer-grained control of the thread runtime and provides:
122112

@@ -135,14 +125,54 @@ For more detailed information, check `Runtime Extension <features/runtime_extens
135125
features/runtime_extension
136126

137127
INT8 Quantization
138-
--------------------------------
128+
-----------------
129+
130+
Intel® Extension for PyTorch* provides built-in quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models. On top of that, if users would like to tune for a higher accuracy than what the default recipe provides, a recipe tuning API powered by Intel® Neural Compressor is provided for users to try.
131+
132+
Check more detailed information for `INT8 Quantization <features/int8_overview.md>`_ and `INT8 recipe tuning API guide (Experimental, *NEW feature in 1.13.0*) <features/int8_recipe_tuning_api.md>`_.
133+
134+
.. toctree::
135+
:hidden:
136+
:maxdepth: 1
137+
138+
features/int8_overview
139+
features/int8_recipe_tuning_api
140+
141+
Codeless Optimization (Experimental, *NEW feature in 1.13.0*)
142+
-------------------------------------------------------------
143+
144+
This feature enables users to get performance benefits from Intel® Extension for PyTorch* without changing Python scripts. It hopefully eases the usage and has been verified working well with broad scope of models, though in few cases there could be small overhead comparing to applying optimizations with Intel® Extension for PyTorch* APIs.
145+
146+
For more detailed information, check `Codeless Optimization <features/codeless_optimization.md>`_.
147+
148+
.. toctree::
149+
:hidden:
150+
:maxdepth: 1
151+
152+
features/codeless_optimization.md
153+
154+
Graph Capture (Experimental, *NEW feature in 1.13.0*)
155+
-----------------------------------------------------
156+
157+
Since graph mode is key for deployment performance, this feature automatically captures graphs based on set of technologies that PyTorch supports, such as TorchScript and TorchDynamo. Users won't need to learn and try different PyTorch APIs to capture graphs, instead, they can turn on a new boolean flag `--graph_mode` (default off) in `ipex.optimize` to get the best of graph optimization.
158+
159+
For more detailed information, check `Graph Capture <features/graph_capture.md>`_.
160+
161+
.. toctree::
162+
:hidden:
163+
:maxdepth: 1
164+
165+
features/graph_capture
166+
167+
HyperTune (Experimental, *NEW feature in 1.13.0*)
168+
-------------------------------------------------
139169

140-
Intel® Extension for PyTorch* has built-in quantization recipes to deliver good statistical accuracy for most popular DL workloads including CNN, NLP and recommendation models.
170+
HyperTune is an experimental feature to perform hyperparameter/execution configuration searching. The searching is used in various areas such as optimization of hyperparameters of deep learning models. The searching is extremely useful in real situations when the number of hyperparameters, including configuration of script execution, and their search spaces are huge that manually tuning these hyperparameters/configuration is impractical and time consuming. Hypertune automates this process of execution configuration searching for the `launcher <performance_tuning/launch_script.md>`_ and Intel® Extension for PyTorch*.
141171

142-
Check more detailed information for `INT8 Quantization <features/int8.md>`_.
172+
For more detailed information, check `HyperTune <features/hypertune.md>`_.
143173

144174
.. toctree::
145175
:hidden:
146176
:maxdepth: 1
147177

148-
features/int8
178+
features/hypertune

0 commit comments

Comments
 (0)