This TensorFlow 2.x Quantization toolkit quantizes (inserts Q/DQ nodes) TensorFlow 2.x Keras models for Quantization-Aware Training (QAT). We follow NVIDIA's QAT recipe, which leads to optimal model acceleration with TensorRT on NVIDIA GPUs and hardware accelerators.
- Implements NVIDIA Quantization recipe.
- Supports fully automated or manual insertion of Quantization and DeQuantization (QDQ) nodes in the TensorFlow 2.x model with minimal code.
- Can easily to add support for new layers.
- Quantization behavior can be set programmatically.
- Implements automatic tests for popular architecture blocks such as residual and inception.
- Offers utilities for TensorFlow 2.x to TensorRT conversion via ONNX.
- Includes example workflows.
Python >= 3.8
TensorFlow >= 2.8
tf2onnx >= 1.10.1
onnx-graphsurgeon
pytest
pytest-html
TensorRT (optional) >= 8.4 GA
Latest TensorFlow 2.x docker image from NGC is recommended.
$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ docker pull nvcr.io/nvidia/tensorflow:22.03-tf2-py3
$ docker run -it --runtime=nvidia --gpus all --net host -v ~/TensorRT/tools/tensorflow-quantization:/home/tensorflow-quantization nvcr.io/nvidia/tensorflow:22.03-tf2-py3 /bin/bash
After last command, you will be placed in /workspace
directory inside the running docker container whereas tensorflow-quantization
repo is mounted in /home
directory.
$ cd /home/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
TensorFlow 2.x Quantization toolkit user guide.
- Only Quantization Aware Training (QAT) is supported as a quantization method.
- Only Functional and Sequential Keras models are supported. Original Keras layers are wrapped into quantized layers using TensorFlow's clone_model method, which doesn't support subclassed models.
- Saving the quantized version of a few layers may not be supported in
TensorFlow < 2.8
:DepthwiseConv2D
support was added in TF 2.8.Conv2DTranspose
is not yet supported by TF (see the open bug here). However, there's a workaround if you do not need the TF2 SavedModel file and just the ONNX file:- Implement
Conv2DTransposeQuantizeWrapper
. See our user guide for more information on how to do that. - Convert the quantized Keras model to ONNX using our provided utility function
convert_keras_model_to_onnx
.
- Implement
- GTC 2022 talk
- Quantization Basics whitepaper