Intel® Extension for PyTorch*

Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel X^e Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.

ipex.llm - Large Language Models (LLMs) Optimization

In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch*. Check LLM optimizations for details.

Optimized Model List

MODEL FAMILY	MODEL NAME (Huggingface hub)	FP32	BF16	Static quantization INT8	Weight only quantization INT8	Weight only quantization INT4
LLAMA	meta-llama/Llama-2-7b-hf	✅	✅	✅	✅	✅
LLAMA	meta-llama/Llama-2-13b-hf	✅	✅	✅	✅	✅
LLAMA	meta-llama/Llama-2-70b-hf	✅	✅	✅	✅	✅
LLAMA	meta-llama/Meta-Llama-3-8B	✅	✅	✅	✅	✅
LLAMA	meta-llama/Meta-Llama-3-70B	✅	✅	✅	✅	✅
LLAMA	meta-llama/Meta-Llama-3.1-8B-Instruct	✅	✅	✅	✅	✅
LLAMA	meta-llama/Llama-3.2-3B-Instruct	✅	✅	✅	✅	✅
LLAMA	meta-llama/Llama-3.2-11B-Vision-Instruct	✅	✅		✅	✅
GPT-J	EleutherAI/gpt-j-6b	✅	✅	✅	✅	✅
GPT-NEOX	EleutherAI/gpt-neox-20b	✅	✅	✅	✅	✅
DOLLY	databricks/dolly-v2-12b	✅	✅	✅	✅	✅
FALCON	tiiuae/falcon-7b	✅	✅	✅	✅	✅
FALCON	tiiuae/falcon-11b	✅	✅	✅	✅	✅
FALCON	tiiuae/falcon-40b	✅	✅	✅	✅	✅
OPT	facebook/opt-30b	✅	✅	✅	✅	✅
OPT	facebook/opt-1.3b	✅	✅	✅	✅	✅
Bloom	bigscience/bloom-1b7	✅	✅	✅	✅	✅
CodeGen	Salesforce/codegen-2B-multi	✅	✅	✅	✅	✅
Baichuan	baichuan-inc/Baichuan2-7B-Chat	✅	✅	✅	✅	✅
Baichuan	baichuan-inc/Baichuan2-13B-Chat	✅	✅	✅	✅	✅
Baichuan	baichuan-inc/Baichuan-13B-Chat	✅	✅	✅	✅	✅
ChatGLM	THUDM/chatglm3-6b	✅	✅	✅	✅	✅
ChatGLM	THUDM/chatglm2-6b	✅	✅	✅	✅	✅
GPTBigCode	bigcode/starcoder	✅	✅	✅	✅	✅
T5	google/flan-t5-xl	✅	✅	✅	✅	✅
MPT	mosaicml/mpt-7b	✅	✅	✅	✅	✅
Mistral	mistralai/Mistral-7B-v0.1	✅	✅	✅	✅	✅
Mixtral	mistralai/Mixtral-8x7B-v0.1	✅	✅		✅	✅
Stablelm	stabilityai/stablelm-2-1_6b	✅	✅	✅	✅	✅
Qwen	Qwen/Qwen-7B-Chat	✅	✅	✅	✅	✅
Qwen	Qwen/Qwen2-7B	✅	✅	✅	✅	✅
LLaVA	liuhaotian/llava-v1.5-7b	✅	✅		✅	✅
GIT	microsoft/git-base	✅	✅		✅	✅
Yuan	IEITYuan/Yuan2-102B-hf	✅	✅		✅
Phi	microsoft/phi-2	✅	✅	✅	✅	✅
Phi	microsoft/Phi-3-mini-4k-instruct	✅	✅	✅	✅	✅
Phi	microsoft/Phi-3-mini-128k-instruct	✅	✅	✅	✅	✅
Phi	microsoft/Phi-3-medium-4k-instruct	✅	✅	✅	✅	✅
Phi	microsoft/Phi-3-medium-128k-instruct	✅	✅	✅	✅	✅
Whisper	openai/whisper-large-v2	✅	✅	✅	✅	✅
Maira	microsoft/maira-2	✅	✅		✅	✅
Jamba	ai21labs/Jamba-v0.1	✅	✅		✅	✅
DeepSeek	deepseek-ai/DeepSeek-V2.5-1210	✅	✅		✅	✅

Note: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and customized linear kernels. We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.

In addition, Intel® Extension for PyTorch* introduces module level optimization APIs (prototype feature) since release 2.3.0. The feature provides optimized alternatives for several commonly used LLM modules and functionalities for the optimizations of the niche or customized LLMs. Please read LLM module level optimization practice to better understand how to optimize your own LLM and achieve better performance.

Support

The team tracks bugs and enhancement requests using GitHub issues. Before submitting a suggestion or bug report, search the existing GitHub issues to see if your issue has already been reported.

License

Apache License, Version 2.0. As found in LICENSE file.

Security

See Intel's Security Center for information on how to report a potential security issue or vulnerability.

Name	Name	Last commit message	Last commit date
Latest commit blzheng Fix nightly ut failure (#3596) Apr 3, 2025 b9395c4 · Apr 3, 2025 History 2,504 Commits
.github	.github	add Windows info collection (#3297)	Oct 12, 2024
cmake	cmake	update flake8 to stock pytorch (#3547)	Mar 4, 2025
csrc	csrc	Fix nightly ut failure (#3596)	Apr 3, 2025
docker	docker	r2.6 backport (#3520)	Feb 18, 2025
docs	docs	update footer (#3562)	Mar 12, 2025
examples/cpu	examples/cpu	fix streamer problems (#3601)	Apr 1, 2025
images	images	Jingxu10/llm dockerfile main (#2366)	Dec 15, 2023
intel_extension_for_pytorch	intel_extension_for_pytorch	Fix nightly ut failure (#3596)	Apr 3, 2025
scripts	scripts	update flake8 to stock pytorch (#3547)	Mar 4, 2025
tests/cpu	tests/cpu	Fix nightly ut failure (#3596)	Apr 3, 2025
third-party-programs	third-party-programs	add tpp files (#384 )	Dec 1, 2021
third_party	third_party	bump oneDNN to v3.7.2 (#3582)	Mar 20, 2025
tools	tools	Update dependency_version.json 20240827 (#3219)	Aug 28, 2024
.bom	.bom	update bom (#3372)	Nov 15, 2024
.clang-format	.clang-format	don't export intel_extension_for_pytorch._C api to user (#395 )	Dec 8, 2021
.clang-tidy	.clang-tidy	Enable git pre hook (#188 )	Sep 3, 2021
.clang-tidy-oss	.clang-tidy-oss	Enable git pre hook (#188 )	Sep 3, 2021
.flake8	.flake8	update flake8 to stock pytorch (#3547)	Mar 4, 2025
.gitignore	.gitignore	Xu update build system (#1361)	Jan 17, 2023
.gitmodules	.gitmodules	IPEX Tensor Parallel (#2435)	Mar 7, 2024
.lintrunner.toml	.lintrunner.toml	update flake8 to stock pytorch (#3547)	Mar 4, 2025
CMakeLists.txt	CMakeLists.txt	add cpu example to auto clang-format. (#2866)	May 11, 2024
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Add CODE_OF_CONDUCT and CONTRIBUTION (#550 )	Feb 21, 2022
CONTRIBUTING.md	CONTRIBUTING.md	delete old lint from doc. (#2946)	Jun 1, 2024
LICENSE	LICENSE	Xu update build system (#1361)	Jan 17, 2023
README.md	README.md	r2.6 backport (#3520)	Feb 18, 2025
SECURITY.md	SECURITY.md	add security file (#505 )	Feb 8, 2022
dependency_version.json	dependency_version.json	Update dependency_version.json 20250325 (#3592)	Mar 25, 2025
requirements.txt	requirements.txt	Jingxu10/ext pr (#2201)	Oct 26, 2023
setup.py	setup.py	update flake8 to stock pytorch (#3547)	Mar 4, 2025
third-party-programs.txt	third-party-programs.txt	add third-party-programs.txt (#3353) (#3364)	Nov 8, 2024
version.txt	version.txt	Update dependency_version.json and version to 2.8 (#3574)	Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intel® Extension for PyTorch*

ipex.llm - Large Language Models (LLMs) Optimization

Optimized Model List

Support

License

Security

About

Releases 37

Packages

Contributors 96

Languages

License

intel/intel-extension-for-pytorch

Folders and files

Latest commit

History

Repository files navigation

Intel® Extension for PyTorch*

ipex.llm - Large Language Models (LLMs) Optimization

Optimized Model List

Support

License

Security

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 37

Packages 0

Contributors 96

Languages

Packages