Skip to content

Commit 2798b43

Browse files
jingxu10tye1zhuhong61
authored
[Doc] highlight some features as experimental (#2152)
* generic python * Update feature list in release note * fine tune, add experimental to horovod, simple trace and profiler_legacy * Update CPU part in release note * add cpu to OS matrix * DDP doc: Add torch-ccl source build command for cpu (#2159) --------- Co-authored-by: Ye Ting <[email protected]> Co-authored-by: zhuhong61 <[email protected]>
1 parent c8c71d2 commit 2798b43

15 files changed

+70
-45
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,7 @@ build_ios
126126
.build_release/*
127127
distribute/*
128128
dist/
129+
docs/_build/
129130
*.testbin
130131
*.bin
131132
cmake_build

docs/tutorials/features.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,9 @@ On Intel® GPUs, quantization usages follow PyTorch default quantization APIs. C
6666
Distributed Training
6767
--------------------
6868

69-
To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs and CPUs are supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support.
69+
To meet demands of large scale model training over multiple devices, distributed training on Intel® GPUs and CPUs are supported. Two alternative methodologies are available. Users can choose either to use PyTorch native distributed training module, `Distributed Data Parallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`_, with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support via `Intel® oneCCL Bindings for PyTorch (formerly known as torch_ccl) <https://github.com/intel/torch-ccl>`_ or use Horovod with `Intel® oneAPI Collective Communications Library (oneCCL) <https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html>`_ support (Experimental).
7070

71-
For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod <features/horovod.md>`_.
71+
For more detailed information, check `DDP <features/DDP.md>`_ and `Horovod (Experimental) <features/horovod.md>`_.
7272

7373
.. toctree::
7474
:hidden:
@@ -122,8 +122,8 @@ For more detailed information, check `Advanced Configuration <features/advanced_
122122
features/advanced_configuration
123123

124124

125-
Legacy Profiler Tool
126-
--------------------
125+
Legacy Profiler Tool (Experimental)
126+
-----------------------------------
127127

128128
The legacy profiler tool is an extension of PyTorch* legacy profiler for profiling operators' overhead on XPU devices. With this tool, users can get the information in many fields of the run models or code scripts. User should build Intel® Extension for PyTorch* with profiler support as default and enable this tool by a `with` statement before the code segment.
129129

@@ -135,8 +135,8 @@ For more detailed information, check `Legacy Profiler Tool <features/profiler_le
135135

136136
features/profiler_legacy
137137

138-
Simple Trace Tool
139-
-----------------
138+
Simple Trace Tool (Experimental)
139+
--------------------------------
140140

141141
Simple Trace is a built-in debugging tool that lets you control printing out the call stack for a piece of code. Once enabled, it can automatically print out verbose messages of called operators in a stack format with indenting to distinguish the context.
142142

docs/tutorials/features/DDP.md

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
1-
# DistributedDataParallel (DDP)
1+
DistributedDataParallel (DDP)
2+
=============================
23

34
## Introduction
45

56
`DistributedDataParallel (DDP)` is a PyTorch\* module that implements multi-process data parallelism across multiple GPUs and machines. With DDP, the model is replicated on every process, and each model replica is fed a different set of input data samples. DDP enables overlapping between gradient communication and gradient computations to speed up training. Please refer to [DDP Tutorial](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) for an introduction to DDP.
67

7-
The PyTorch `Collective Communication (c10d)` library supports communication across processes. To run DDP on XPU, we use Intel® oneCCL Bindings for Pytorch\* (formerly known as torch-ccl) to implement the PyTorch c10d ProcessGroup API (https://github.com/intel/torch-ccl). It holds PyTorch bindings maintained by Intel for the Intel® oneAPI Collective Communications Library\* (oneCCL), a library for efficient distributed deep learning training implementing such collectives as `allreduce`, `allgather`, and `alltoall`. Refer to [oneCCL Github page](https://github.com/oneapi-src/oneCCL) for more information about oneCCL.
8+
The PyTorch `Collective Communication (c10d)` library supports communication across processes. To run DDP on GPU, we use Intel® oneCCL Bindings for Pytorch\* (formerly known as torch-ccl) to implement the PyTorch c10d ProcessGroup API (https://github.com/intel/torch-ccl). It holds PyTorch bindings maintained by Intel for the Intel® oneAPI Collective Communications Library\* (oneCCL), a library for efficient distributed deep learning training implementing such collectives as `allreduce`, `allgather`, and `alltoall`. Refer to [oneCCL Github page](https://github.com/oneapi-src/oneCCL) for more information about oneCCL.
89

910
## Installation of Intel® oneCCL Bindings for Pytorch\*
1011

11-
To use PyTorch DDP on XPU, install Intel® oneCCL Bindings for Pytorch\* as described below.
12+
To use PyTorch DDP on GPU, install Intel® oneCCL Bindings for Pytorch\* as described below.
1213

1314
### Install PyTorch and Intel® Extension for PyTorch\*
1415

@@ -19,6 +20,18 @@ For more detailed information, check [installation guide](../installation.md).
1920

2021
#### Install from source:
2122

23+
Installation for CPU:
24+
25+
```bash
26+
git clone https://github.com/intel/torch-ccl.git -b v1.13.0
27+
cd torch-ccl
28+
git submodule sync
29+
git submodule update --init --recursive
30+
python setup.py install
31+
```
32+
33+
Installation for GPU:
34+
2235
```bash
2336
git clone https://github.com/intel/torch-ccl.git -b v1.13.100+gpu
2437
cd torch-ccl
@@ -29,19 +42,24 @@ BUILD_NO_ONECCL_PACKAGE=ON COMPUTE_BACKEND=dpcpp python setup.py install
2942

3043
#### Install from prebuilt wheel:
3144

32-
Installation for CPU:
45+
Prebuilt wheel files for CPU, GPU with generic Python\* and GPU with Intel® Distribution for Python\* are released in separate repositories.
3346

34-
```bash
35-
python -m pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable-cpu
47+
```
48+
# Generic Python* for CPU
49+
REPO_URL: https://developer.intel.com/ipex-whl-stable-cpu
50+
# Generic Python* for GPU
51+
REPO_URL: https://developer.intel.com/ipex-whl-stable-xpu
52+
# Intel® Distribution for Python*
53+
REPO_URL: https://developer.intel.com/ipex-whl-stable-xpu-idp
3654
```
3755

38-
Installation for GPU:
56+
Installation from either repository shares the command below. Replace the place holder `<REPO_URL>` with a real URL mentioned above.
3957

4058
```bash
41-
python -m pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable-xpu
59+
python -m pip install oneccl_bind_pt -f <REPO_URL>
4260
```
4361

44-
**Note:** Make sure you have installed basekit from https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit
62+
**Note:** Make sure you have installed [basekit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit) when using Intel® oneCCL Bindings for Pytorch\* on Intel® GPUs.
4563

4664
```bash
4765
source $basekit_root/ccl/latest/env/vars.sh
@@ -165,7 +183,7 @@ For using one GPU card with multiple tiles, each tile could be regarded as a dev
165183

166184
### Usage of DDP scaling API
167185

168-
Note: This API supports XPU devices on one card.
186+
Note: This API supports GPU devices on one card.
169187

170188
```python
171189
Args:
@@ -221,5 +239,5 @@ print("DDP Use XPU: {} for training".format(xpu))
221239
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
222240
num_workers=args.workers, pin_memory=True, sampler=train_sampler)
223241
```
224-
Then you can start your model training on multiple XPU devices of one card.
242+
Then you can start your model training on multiple GPU devices of one card.
225243

docs/tutorials/features/DLPack.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
2-
# DLPack Solution
1+
DLPack Solution
2+
===============
33

44
## Introduction
55

docs/tutorials/features/DPC++_Extension.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
# DPC++ Extension
1+
DPC++ Extension
2+
===============
23

34
## Introduction
45

docs/tutorials/features/advanced_configuration.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
# Advanced Configuration
1+
Advanced Configuration
2+
======================
23

34
The default settings for Intel® Extension for PyTorch\* are sufficient for most use cases. However, if users want to customize Intel® Extension for PyTorch\*, advanced configuration is available at build time and runtime.
45

docs/tutorials/features/amp_cpu.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
# Auto Mixed Precision (AMP) on CPU
1+
Auto Mixed Precision (AMP) on CPU
2+
=================================
23

34
## Introduction
45

docs/tutorials/features/amp_gpu.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
# Auto Mixed Precision (AMP) on GPU
1+
Auto Mixed Precision (AMP) on GPU
2+
=================================
23

34
## Introduction
45

docs/tutorials/features/horovod.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
# Horovod with PyTorch
1+
Horovod with PyTorch (Experimental)
2+
===================================
23

34
Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use. Horovod core principles are based on MPI concepts such as size, rank, local rank, allreduce, allgather, broadcast, and alltoall. To use Horovod with PyTorch, you need to install Horovod with Pytorch first, and make specific change for Horovod in your training script.
45

docs/tutorials/features/profiler_legacy.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
2-
# Legacy Profiler Tool
1+
Legacy Profiler Tool (Experimental)
2+
===================================
33

44
## Introduction
55

0 commit comments

Comments
 (0)