·
2389 commits
to develop
since this release
Post-training Quantization:
Features:
- (TensorFlow) The
nncf.quantize()
method is now the recommended API for Quantization-Aware Training. Please refer to an example for more details about how to use a new approach. - (TensorFlow) Compression layers placement in the model now can be serialized and restored with new API functions:
nncf.tensorflow.get_config()
andnncf.tensorflow.load_from_config()
. Please see the documentation for the saving/loading of a quantized model for more details. - (OpenVINO) Added example with LLM quantization to FP8 precision.
- (TorchFX, Experimental) Preview support for the new
quantize_pt2e
API has been introduced, enabling quantization oftorch.fx.GraphModule
models with theOpenVINOQuantizer
and theX86InductorQuantizer
quantizers.quantize_pt2e
API utilizes MinMax algorithm statistic collectors, as well as SmoothQuant, BiasCorrection and FastBiasCorrection Post-Training Quantization algorithms. - Added unification of scales for ScaledDotProductAttention operation.
Fixes:
- (ONNX) Fixed sporadic accuracy issues with the BiasCorrection algorithm.
- (ONNX) Fixed GroupConvolution operation weight quantization, which also improves performance for a number of models.
- Fixed AccuracyAwareQuantization algorithm to solve #3118 issue.
- Fixed issue with NNCF usage with potentially corrupted backend frameworks.
Improvements:
- (TorchFX, Experimental) Added YoloV11 support.
- (OpenvINO) The performance of the FastBiasCorrection algorithm was improved.
- Significantly faster data-free weight compression for OpenVINO models: INT4 compression is now up to 10x faster, while INT8 compression is up to 3x faster. The larger the model the higher the time reduction.
- AWQ weight compression is now up to 2x faster, improving overall runtime efficiency.
- Peak memory usage during INT4 data-free weight compression in the OpenVINO backend is reduced by up to 50% for certain models.
Tutorials:
- Post-Training Optimization of GLM-Edge-V Model
- Post-Training Optimization of OmniGen Model
- Post-Training Optimization of Sana Models
- Post-Training Optimization of BGE Models
- Post-Training Optimization of Stable Diffusion Inpainting Model
- Post-Training Optimization of LTX Video Model
- Post-Training Optimization of DeepSeek-R1-Distill Model
- Post-Training Optimization of Janus DeepSeek-LLM-1.3b Model
Deprecations/Removals:
- (TensorFlow) The
nncf.tensorflow.create_compressed_model()
method is now marked as deprecated. Please use thenncf.quantize()
method for the quantization initialization.
Requirements:
- Updated the minimal version for
numpy
(>=1.24.0). - Removed
tqdm
dependency.
Acknowledgements
Thanks for contributions from the OpenVINO developer community:
@rk119
@devesh-2002