Skip to content

Commit d875788

Browse files
authored
update release notes for 2.4.0 (#3194)
1 parent 676ea03 commit d875788

File tree

1 file changed

+63
-31
lines changed

1 file changed

+63
-31
lines changed

docs/tutorials/releases.md

+63-31
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,63 @@
11
Releases
2-
=============
2+
========
3+
4+
## 2.4.0
5+
6+
We are excited to announce the release of Intel® Extension for PyTorch\* 2.4.0+cpu which accompanies PyTorch 2.4. This release mainly brings you the support for Llama3.1, basic support for LLM serving frameworks like vLLM/TGI, and a set of optimization to push better performance for LLM models. This release also extends the list of optimized LLM models to a broader level and includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try this release and feedback as to improve further on this product.
7+
8+
### Highlights
9+
10+
- Llama 3.1 support
11+
12+
Meta has newly released [Llama 3.1](https://ai.meta.com/blog/meta-llama-3-1/) with new features like longer context length (128K) support. Intel® Extension for PyTorch\* provides [support of Llama 3.1](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-ai-solutions-support-meta-llama-3-1-launch.html) since its launch date with early release version, and now support with this official release.
13+
14+
- Serving framework support
15+
16+
Typical LLM serving frameworks including vLLM, TGI can co-work with Intel® Extension for PyTorch\* now which provides optimized performance for Xeon® Scalable CPUs. Besides the integration of LLM serving frameworks with ipex.llm module level APIs, we also continue optimizing the performance and quality of underneath Intel® Extension for PyTorch\* operators such as paged attention and flash attention. We also provide new support in ipex.llm module level APIs for 4bits AWQ quantization based on weight only quantization, and distributed communications with shared memory optimization.
17+
18+
- Large Language Model (LLM) optimization:
19+
20+
Intel® Extension for PyTorch\* further optimized the performance of the weight only quantization kernels, enabled more fusion pattern variants for LLMs and extended the optimized models to include whisper, falcon-11b, Qwen2, and definitely Llama 3.1, etc. A full list of optimized models can be found at [LLM optimization](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0+cpu/examples/cpu/llm/inference).
21+
22+
- Bug fixing and other optimization
23+
24+
- Fixed the quantization with auto-mixed-precision (AMP) mode of Qwen-7b [#3030](https://github.com/intel/intel-extension-for-pytorch/commit/ad29b2346fe0b26e87e1aefc15e1eb25fb4b9b4d)
25+
26+
- Fixed the illegal memory access issue in the Flash Attention kernel [#2987](https://github.com/intel/intel-extension-for-pytorch/commit/620a9bfd9db42813931a857e78fa3f5d298be200)
27+
28+
- Re-structured the paths of LLM example scripts [#3080](https://github.com/intel/intel-extension-for-pytorch/commit/bee4a423d99b4dea7362d8cb31b1d48e38344a8f)
29+
30+
- Upgraded oneDNN to v3.5.2 [#3143](https://github.com/intel/intel-extension-for-pytorch/commit/7911528f0fef4e1b493cb0b363bf76de2eb6a9ca)
31+
32+
- Misc fix and enhancement [#3079](https://github.com/intel/intel-extension-for-pytorch/commit/e74d7a97186e6cafc8e41c2b40f03e95fe6c8060) [#3116](https://github.com/intel/intel-extension-for-pytorch/commit/76dfb92af8aa4778aff09a089bde70f614712b33)
33+
34+
**Full Changelog**: https://github.com/intel/intel-extension-for-pytorch/compare/v2.3.0+cpu...v2.4.0+cpu
335

436
## 2.3.100
537

6-
### Highlights
38+
### Highlights
739

8-
- Added the optimization for Phi-3: [#2883](https://github.com/intel/intel-extension-for-pytorch/commit/5fde074252d9b61dd0d410832724cbbec882cb96)
40+
- Added the optimization for Phi-3: [#2883](https://github.com/intel/intel-extension-for-pytorch/commit/5fde074252d9b61dd0d410832724cbbec882cb96)
941

10-
- Fixed the `state_dict` method patched by `ipex.optimize` to support DistributedDataParallel [#2910](https://github.com/intel/intel-extension-for-pytorch/commit/9a192efa4cf9a9a2dabac19e57ec5d81f9f5d22c)
42+
- Fixed the `state_dict` method patched by `ipex.optimize` to support DistributedDataParallel [#2910](https://github.com/intel/intel-extension-for-pytorch/commit/9a192efa4cf9a9a2dabac19e57ec5d81f9f5d22c)
1143

12-
- Fixed the linking issue in CPPSDK [#2911](https://github.com/intel/intel-extension-for-pytorch/commit/38573f2938061620f072346d2b3345b69454acbc)
44+
- Fixed the linking issue in CPPSDK [#2911](https://github.com/intel/intel-extension-for-pytorch/commit/38573f2938061620f072346d2b3345b69454acbc)
1345

14-
- Fixed the ROPE kernel for cases where the batch size is larger than one [#2928](https://github.com/intel/intel-extension-for-pytorch/commit/2d02768af957011244dd9ca89186cc1318466d6c)
46+
- Fixed the ROPE kernel for cases where the batch size is larger than one [#2928](https://github.com/intel/intel-extension-for-pytorch/commit/2d02768af957011244dd9ca89186cc1318466d6c)
1547

16-
- Upgraded deepspeed to v0.14.3 to include the support for Phi-3 [#2985](https://github.com/intel/intel-extension-for-pytorch/commit/73105990e551656f79104dd93adc4a8020978947)
48+
- Upgraded deepspeed to v0.14.3 to include the support for Phi-3 [#2985](https://github.com/intel/intel-extension-for-pytorch/commit/73105990e551656f79104dd93adc4a8020978947)
1749

1850
**Full Changelog**: https://github.com/intel/intel-extension-for-pytorch/compare/v2.3.0+cpu...v2.3.100+cpu
1951

2052
## 2.3.0
2153

22-
We are excited to announce the release of Intel® Extension for PyTorch* 2.3.0+cpu which accompanies PyTorch 2.3. This release mainly brings you the new feature on Large Language Model (LLM) called module level LLM optimization API, which provides module level optimizations for commonly used LLM modules and functionalities, and targets to optimize customized LLM modeling for scenarios like private models, self-customized models, LLM serving frameworks, etc. This release also extends the list of optimized LLM models to a broader level and includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try this release and feedback as to improve further on this product.
54+
We are excited to announce the release of Intel® Extension for PyTorch\* 2.3.0+cpu which accompanies PyTorch 2.3. This release mainly brings you the new feature on Large Language Model (LLM) called module level LLM optimization API, which provides module level optimizations for commonly used LLM modules and functionalities, and targets to optimize customized LLM modeling for scenarios like private models, self-customized models, LLM serving frameworks, etc. This release also extends the list of optimized LLM models to a broader level and includes a set of bug fixing and small optimizations. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try this release and feedback as to improve further on this product.
2355

2456
### Highlights
2557

2658
- Large Language Model (LLM) optimization
2759

28-
[Intel® Extension for PyTorch*](https://github.com/intel/intel-extension-for-pytorch) provides a new feature called module level LLM optimization API, which provides module level optimizations for commonly used LLM modules and functionalities. LLM creators can then use this new API set to replace related parts in models by themselves, with which to reach peak performance.
60+
[Intel® Extension for PyTorch\*](https://github.com/intel/intel-extension-for-pytorch) provides a new feature called module level LLM optimization API, which provides module level optimizations for commonly used LLM modules and functionalities. LLM creators can then use this new API set to replace related parts in models by themselves, with which to reach peak performance.
2961

3062
There are 3 categories of module level LLM optimization APIs in general:
3163

@@ -77,10 +109,10 @@ We are excited to announce the release of Intel® Extension for PyTorch* 2.3.0+c
77109

78110
- Bug fixing and other optimization
79111

80-
- Optimized the performance of LLM [#2561](https://github.com/intel/intel-extension-for-pytorch/commit/ade45387ecc4e707754de9db6fc2be0af186e2ba) [#2584](https://github.com/intel/intel-extension-for-pytorch/commit/05d07645e1ae5eeeff15abda31a6ba5806dd2bb2) [#2617](https://github.com/intel/intel-extension-for-pytorch/commit/adb563834a4f6bd327d7307c493c8fe1648e6211) [#2663](https://github.com/intel/intel-extension-for-pytorch/commit/214dea0c8e7b2864a0c2d1a1c32fb7815ca68070) [#2733](https://github.com/intel/intel-extension-for-pytorch/commit/f5b941c3b7ea8fe1a387617a9329467d1e1b544a)
112+
- Optimized the performance of LLM [#2561](https://github.com/intel/intel-extension-for-pytorch/commit/ade45387ecc4e707754de9db6fc2be0af186e2ba) [#2584](https://github.com/intel/intel-extension-for-pytorch/commit/05d07645e1ae5eeeff15abda31a6ba5806dd2bb2) [#2617](https://github.com/intel/intel-extension-for-pytorch/commit/adb563834a4f6bd327d7307c493c8fe1648e6211) [#2663](https://github.com/intel/intel-extension-for-pytorch/commit/214dea0c8e7b2864a0c2d1a1c32fb7815ca68070) [#2733](https://github.com/intel/intel-extension-for-pytorch/commit/f5b941c3b7ea8fe1a387617a9329467d1e1b544a)
81113
- Supported Act Order of GPTQ [#2550](https://github.com/intel/intel-extension-for-pytorch/commit/be636289eef628b995e79a475c58f8a4d93e4890) [#2568](https://github.com/intel/intel-extension-for-pytorch/commit/9fcc4897492333330fb6bd156b1178d55347d292)
82114
- Improved the warning and the logging information for better user experience [#2641](https://github.com/intel/intel-extension-for-pytorch/commit/e0bf673cf3ea4063a7e168ec221f421fbd378fb3) [#2675](https://github.com/intel/intel-extension-for-pytorch/commit/770275a755ea0445675720a3f6f14e77c491fceb)
83-
- Added TorchServe CPU Example [#2613](https://github.com/intel/intel-extension-for-pytorch/commit/1f6fe6423dde7ccecc1565e73dc81d9cb281bc1f)
115+
- Added TorchServe CPU Example [#2613](https://github.com/intel/intel-extension-for-pytorch/commit/1f6fe6423dde7ccecc1565e73dc81d9cb281bc1f)
84116
- Upgraded oneDNN to v3.4.1 [#2747](https://github.com/intel/intel-extension-for-pytorch/commit/e2a9af49874fcf39097036c08848cd37cadc0084)
85117
- Misc fix and enhancement [#2468](https://github.com/intel/intel-extension-for-pytorch/commit/f88a7d127a6a3017db508454c7d332d7b2ad83f6) [#2627](https://github.com/intel/intel-extension-for-pytorch/commit/bc32ea463084d711e4a9aae85e38dd5d7d427849) [#2631](https://github.com/intel/intel-extension-for-pytorch/commit/f55a2bfa5d505fb7c7a6225c1c6206b5926777ab) [#2704](https://github.com/intel/intel-extension-for-pytorch/commit/eae477f76356b5a83640941787a168f680334775)
86118

@@ -202,7 +234,7 @@ We are pleased to announce the release of Intel® Extension for PyTorch\* 2.0.0-
202234
- **MHA optimization with Flash Attention**: Intel optimized MHA module with Flash Attention technique as inspired by [Stanford paper](https://arxiv.org/abs/2205.14135). This brings less memory consumption for LLM, and also provides better inference performance for models like BERT, Stable Diffusion, etc.
203235

204236
- **Work with torch.compile as an backend (Experimental)**: PyTorch 2.0 introduces a new feature, `torch.compile`, to speed up PyTorch execution. We've enabled Intel® Extension for PyTorch as a backend of torch.compile, which can leverage this new PyTorch API's power of graph capture and provide additional optimization based on these graphs.
205-
The usage of this new feature is quite simple as below:
237+
The usage of this new feature is quite simple as below:
206238

207239
```python
208240
import torch
@@ -217,7 +249,7 @@ model = torch.compile(model, backend='ipex')
217249
- Supported [RMSNorm](https://arxiv.org/abs/1910.07467) which is widely used in the t5 model of huggingface [#1341](https://github.com/intel/intel-extension-for-pytorch/commit/d1de1402a8d6b9ca49b9c9a45a92899f7566866a)
218250
- Optimized InstanceNorm [#1330](https://github.com/intel/intel-extension-for-pytorch/commit/8b97d2998567cc2fda6eb008194cd64f624e857f)
219251
- Fixed the quantization of LSTM [#1414](https://github.com/intel/intel-extension-for-pytorch/commit/a4f93c09855679d2b424ca5be81930e3a4562cef) [#1473](https://github.com/intel/intel-extension-for-pytorch/commit/5b44996dc0fdb5c45995d403e18a44f2e1a11b3d)
220-
- Fixed the correctness issue of unpacking non-contiguous Linear weight [#1419](https://github.com/intel/intel-extension-for-pytorch/commit/84d413d6c10e16c025c407b68652b1769597e016)
252+
- Fixed the correctness issue of unpacking non-contiguous Linear weight [#1419](https://github.com/intel/intel-extension-for-pytorch/commit/84d413d6c10e16c025c407b68652b1769597e016)
221253
- oneDNN update [#1488](https://github.com/intel/intel-extension-for-pytorch/commit/fd5c10b664d19c87f8d94cf293077f65f78c3937)
222254

223255
### Known Issues
@@ -273,7 +305,7 @@ We are pleased to announce the release of Intel® Extension for PyTorch\* 1.13.0
273305
--model_name_or_path bert-base-uncased --dataset_name squad --do_eval \
274306
--per_device_train_batch_size 12 --learning_rate 3e-5 --num_train_epochs 2 \
275307
--max_seq_length 384 --doc_stride 128 --output_dir /tmp/debug_squad/
276-
308+
277309
# automatically apply bfloat16 optimization (--auto-ipex --dtype bfloat16)
278310
ipexrun --use_default_allocator --ninstance 2 --ncore_per_instance 28 --auto_ipex --dtype bfloat16 run_qa.py \
279311
--model_name_or_path bert-base-uncased --dataset_name squad --do_eval \
@@ -363,7 +395,7 @@ Highlights include:
363395
</tr>
364396
<tr>
365397
<td valign="top">
366-
398+
367399
```python
368400
import intel_extension_for_pytorch as ipex
369401
# Calibrate the model
@@ -376,17 +408,17 @@ Highlights include:
376408
conf = ipex.quantization.QuantConf('qconfig.json')
377409
with torch.no_grad():
378410
traced_model = ipex.quantization.convert(model, conf, example_input)
379-
# Do inference
411+
# Do inference
380412
y = traced_model(x)
381413
```
382-
414+
383415
</td>
384416
<td valign="top">
385-
417+
386418
```python
387419
import intel_extension_for_pytorch as ipex
388420
# Calibrate the model
389-
qconfig = ipex.quantization.default_static_qconfig # Histogram calibration algorithm and
421+
qconfig = ipex.quantization.default_static_qconfig # Histogram calibration algorithm and
390422
calibrated_model = ipex.quantization.prepare(model_to_be_calibrated, qconfig, example_inputs=example_inputs)
391423
for data in calibration_data_set:
392424
calibrated_model(data)
@@ -395,10 +427,10 @@ Highlights include:
395427
with torch.no_grad():
396428
traced_model = torch.jit.trace(quantized_model, example_input)
397429
traced_model = torch.jit.freeze(traced_model)
398-
# Do inference
430+
# Do inference
399431
y = traced_model(x)
400432
```
401-
433+
402434
</td>
403435
</tr>
404436
</tbody>
@@ -414,18 +446,18 @@ Highlights include:
414446
</tr>
415447
<tr>
416448
<td valign="top">
417-
449+
418450
```python
419451
import intel_extension_for_pytorch as ipex
420452
# Create CPU pool
421453
cpu_pool = ipex.cpu.runtime.CPUPool(node_id=0)
422454
# Create multi-stream model
423455
multi_Stream_model = ipex.cpu.runtime.MultiStreamModule(model, num_streams=2, cpu_pool=cpu_pool)
424456
```
425-
457+
426458
</td>
427459
<td valign="top">
428-
460+
429461
```python
430462
import intel_extension_for_pytorch as ipex
431463
# Create CPU pool
@@ -438,7 +470,7 @@ Highlights include:
438470
multi_stream_input_hint, # optional
439471
multi_stream_output_hint ) # optional
440472
```
441-
473+
442474
</td>
443475
</tr>
444476
</tbody>
@@ -454,26 +486,26 @@ Highlights include:
454486
</tr>
455487
<tr>
456488
<td valign="top">
457-
489+
458490
```python
459491
import intel_extension_for_pytorch as ipex
460492
model = ...
461493
model.load_state_dict(torch.load(PATH))
462494
model.eval()
463495
optimized_model = ipex.optimize(model, dtype=torch.bfloat16)
464496
```
465-
497+
466498
</td>
467499
<td valign="top">
468-
500+
469501
```python
470502
import intel_extension_for_pytorch as ipex
471503
model = ...
472504
model.load_state_dict(torch.load(PATH))
473505
model.eval()
474506
optimized_model = ipex.optimize(model, dtype=torch.bfloat16, sample_input=input)
475507
```
476-
508+
477509
</td>
478510
</tr>
479511
</tbody>
@@ -577,7 +609,7 @@ We are excited to announce Intel® Extension for PyTorch\* 1.11.0-cpu release by
577609
### Highlights
578610
- Combine the AVX2 and AVX512 binary as a single binary and automatically dispatch to different implementations based on hardware ISA detection at runtime. The typical case is to serve the data center that mixtures AVX2-only and AVX512 platforms. It does not need to deploy the different ISA binary now compared to the previous version
579611

580-
***NOTE***: The extension uses the oneDNN library as the backend. However, the BF16 and INT8 operator sets and features are different between AVX2 and AVX512. Refer to [oneDNN document](https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#processors-with-the-intel-avx2-or-intel-avx-512-support) for more details.
612+
***NOTE***: The extension uses the oneDNN library as the backend. However, the BF16 and INT8 operator sets and features are different between AVX2 and AVX512. Refer to [oneDNN document](https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#processors-with-the-intel-avx2-or-intel-avx-512-support) for more details.
581613

582614
> When one input is of type u8, and the other one is of type s8, oneDNN assumes the user will choose the quantization parameters so no overflow/saturation occurs. For instance, a user can use u7 [0, 127] instead of u8 for the unsigned input, or s7 [-64, 63] instead of the s8 one. It is worth mentioning that this is required only when the Intel AVX2 or Intel AVX512 Instruction Set is used.
583615
@@ -606,7 +638,7 @@ pip install intel_extension_for_pytorch
606638
</tbody>
607639
</table>
608640

609-
- Compared to the previous version, this release provides a dedicated installation file for the C++ SDK. The installation file automatically detects the PyTorch C++ SDK location and installs the extension C++ SDK files to the PyTorch C++ SDK. The user does not need to manually add the extension C++ SDK source files and CMake to the PyTorch SDK. In addition to that, the installation file reduces the C++ SDK binary size from ~220MB to ~13.5MB.
641+
- Compared to the previous version, this release provides a dedicated installation file for the C++ SDK. The installation file automatically detects the PyTorch C++ SDK location and installs the extension C++ SDK files to the PyTorch C++ SDK. The user does not need to manually add the extension C++ SDK source files and CMake to the PyTorch SDK. In addition to that, the installation file reduces the C++ SDK binary size from ~220MB to ~13.5MB.
610642

611643
<table align="center">
612644
<tbody>

0 commit comments

Comments
 (0)