Skip to content

Latest commit

 

History

History
160 lines (107 loc) · 4.42 KB

README.md

File metadata and controls

160 lines (107 loc) · 4.42 KB

nnieqat-pytorch

This is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and input data as fake fp32 format. To train model which is more friendly to NNIE, just import nnieqat and replace torch.nn default modules with corresponding one.

Note: import nniepat before torch modules, do not support multi-gpu training.

Table of Contents

  1. Installation
  2. Usage
  3. Code Examples
  4. Results
  5. Todo
  6. Reference

Installation

  • Supported Platforms: Linux

  • Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1 or 10.2.

  • Dependencies:

    • python >= 3.5, < 4
    • llvmlite >= 0.31.0
    • pytorch >= 1.5
    • numba >= 0.42.0
    • numpy >= 1.18.1
  • Install nnieqat via pypi:

    $ pip install nnieqat
  • Install nnieqat in docker(easy way to solve environment problems):

    $ cd docker
    $ docker build -t nnieqat-image .
    

Usage

  • add quantization hook.

    quantize and dequantize weight and data with HiSVP GFPQ library in forward() process.

    from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
    ...
    ...
      register_quantization_hook(model)
    ...
  • merge bn weight into conv and freeze bn

    suggest finetuning from a well-trained model, merge_freeze_bn at beginning. do it after a few epochs of training otherwise.

    from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
    ...
    ...
        model.train()
        model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
    ...
  • Unquantize weight before update it

    from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
    ...
    ...
        model.apply(unquant_weight)  # using original weight while updating
        optimizer.step()
    ...
  • Dump weight optimized model

    from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
    ...
    ...
        model.apply(quant_dequant_weight)
        save_checkpoint(...)
        model.apply(unquant_weight)
    ...

Code Examples

Results

  • ImageNet

    python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.001 --pretrained --epoch 10   # nnie_lr_e-3_ft
    python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # lr_e-4_ft
    python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # nnie_lr_e-4_ft
    

    finetune result:

    trt_fp32 trt_int8 nnie
    torchvision 0.56992 0.56424 0.56026
    nnie_lr_e-3_ft 0.56600 0.56328 0.56612
    lr_e-4_ft 0.57884 0.57502 0.57542
    nnie_lr_e-4_ft 0.57834 0.57524 0.57730

Todo

  • Multiple GPU training support.

  • Generate quantized model directly.

Reference

HiSVP 量化库使用指南

Quantizing deep convolutional networks for efficient inference: A whitepaper

8-bit Inference with TensorRT

Distilling the Knowledge in a Neural Network