This is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and input data as fake fp32 format. To train model which is more friendly to NNIE, just import nnieqat and replace torch.nn default modules with corresponding one.
Note: import nniepat before torch modules, do not support multi-gpu training.
-
Supported Platforms: Linux
-
Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1 or 10.2.
-
Dependencies:
- python >= 3.5, < 4
- llvmlite >= 0.31.0
- pytorch >= 1.5
- numba >= 0.42.0
- numpy >= 1.18.1
-
Install nnieqat via pypi:
$ pip install nnieqat
-
Install nnieqat in docker(easy way to solve environment problems):
$ cd docker $ docker build -t nnieqat-image .
-
add quantization hook.
quantize and dequantize weight and data with HiSVP GFPQ library in forward() process.
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... register_quantization_hook(model) ...
-
merge bn weight into conv and freeze bn
suggest finetuning from a well-trained model, merge_freeze_bn at beginning. do it after a few epochs of training otherwise.
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... model.train() model = merge_freeze_bn(model) #it will change bn to eval() mode during training ...
-
Unquantize weight before update it
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... model.apply(unquant_weight) # using original weight while updating optimizer.step() ...
-
Dump weight optimized model
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... model.apply(quant_dequant_weight) save_checkpoint(...) model.apply(unquant_weight) ...
-
Cifar10 quantization aware training example (add nnieqat into pytorch_cifar10_tutorial)
python test/test_cifar10.py
-
ImageNet quantization finetuning example (add nnieqat into pytorh_imagenet_main.py)
python test/test_imagenet.py --pretrained path_to_imagenet_dataset
-
ImageNet
python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1 --lr 0.001 --pretrained --epoch 10 # nnie_lr_e-3_ft python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1 --lr 0.0001 --pretrained --epoch 10 # lr_e-4_ft python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1 --lr 0.0001 --pretrained --epoch 10 # nnie_lr_e-4_ft
finetune result:
trt_fp32 trt_int8 nnie torchvision 0.56992 0.56424 0.56026 nnie_lr_e-3_ft 0.56600 0.56328 0.56612 lr_e-4_ft 0.57884 0.57502 0.57542 nnie_lr_e-4_ft 0.57834 0.57524 0.57730
-
Multiple GPU training support.
-
Generate quantized model directly.
HiSVP 量化库使用指南
Quantizing deep convolutional networks for efficient inference: A whitepaper