We provide a collection of popular neural network models and compare their floating point and quantized performance. Results demonstrate that quantized models can provide good accuracy, comparable to floating point models. Together with results, we also provide scripts and artifacts for users to quantize floating-point models using the AI Model Efficiency ToolKit (AIMET).
Quantized inference is significantly faster than floating-point inference, and enables models to run in a power-efficient manner on mobile and edge devices. We use AIMET, a library that includes state-of-the-art techniques for quantization, to quantize various models available in PyTorch and TensorFlow frameworks.
An original FP32 source model is quantized either using post-training quantization (PTQ) or Quantization-Aware-Training (QAT) technique available in AIMET. Example scripts for evaluation are provided for each model. When PTQ is needed, the evaluation script performs PTQ before evaluation. Wherever QAT is used, the fine-tuned model checkpoint is also provided.
Task | Network[1] | Model Source[2] | Floating Pt (FP32) Model [3] | Quantized Model [4] | Results [5] | |||
---|---|---|---|---|---|---|---|---|
Metric | FP32 | W8A8[6] | W4A8[7] | |||||
Image Classification | MobileNetV2 | GitHub Repo | Pretrained Model | Quantized Model | (ImageNet) Top-1 Accuracy | 71.67% | 71.14% | TBD |
Image Classification | Resnet18 | Pytorch Torchvision | Pytorch Torchvision | Quantized Model | (ImageNet) Top-1 Accuracy | 69.75% | 69.54% | 69.1% |
Image Classification | Resnet50 | Pytorch Torchvision | Pytorch Torchvision | Quantized Model | (ImageNet) Top-1 Accuracy | 76.14% | 75.81% | 75.63% |
Image Classification | Regnet_x_3_2gf | Pytorch Torchvision | Pytorch Torchvision | Quantized Model | (ImageNet) Top-1 Accuracy | 78.36% | 78.10% | 77.70% |
Image Classification | EfficientNet-lite0 | GitHub Repo | Pretrained Model | Quantized Model | (ImageNet) Top-1 Accuracy | 75.40% | 75.36% | 74.46% |
Image Classification | ViT | Repo | Prepared Models | See Example | (ImageNet dataset) Accuracy | 81.32 | 81.57 | TBD |
Image Classification | MobileViT | Repo | Prepared Models | See Example | (ImageNet dataset) Accuracy | 78.46 | 77.59 | TBD |
Object Detection | MobileNetV2-SSD-Lite | GitHub Repo | Pretrained Model | Quantized Model | (PascalVOC) mAP | 68.7% | 68.6% | TBD |
Pose Estimation | Pose Estimation | Based on Ref. | Based on Ref. | Quantized Model | (COCO) mAP | 0.364 | 0.359 | TBD |
(COCO) mAR | 0.436 | 0.432 | TBD | |||||
Pose Estimation | HRNET-Posenet | Based on Ref. | FP32 Model | Quantized Model | (COCO) mAP | 0.765 | 0.763 | 0.762 |
(COCO) mAR | 0.793 | 0.792 | 0.791 | |||||
Super Resolution | SRGAN | GitHub Repo | Pretrained Model (older version from here) | See Example | (BSD100) PSNR / SSIM Detailed Results | 25.51 / 0.653 | 25.5 / 0.648 | TBD |
Super Resolution | Anchor-based Plain Net (ABPN) | Based on Ref. | See Tarballs | See Example | Average PSNR Results | TBD | ||
Super Resolution | Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution (XLSR) | Based on Ref. | See Tarballs | See Example | Average PSNR Results | TBD | ||
Super Resolution | Super-Efficient Super Resolution (SESR) | Based on Ref. | See Tarballs | See Example | Average PSNR Results | TBD | ||
Super Resolution | QuickSRNet | - | See Tarballs | See Example | Average PSNR Results | TBD | ||
Semantic Segmentation | DeepLabV3+ | GitHub Repo | Pretrained Model | Quantized Model | (PascalVOC) mIOU | 72.91% | 72.44% | 72.18% |
Semantic Segmentation | HRNet-W48 | GitHub Repo | Original model weight not available | See Example | (Cityscapes) mIOU | 81.04% | 80.65% | 80.07% |
Semantic Segmentation | InverseForm (HRNet-16-Slim-IF) | GitHub Repo | Pretrained Model | See Example | (Cityscapes) mIOU | 77.81% | 77.17% | TBD |
Semantic Segmentation | InverseForm (OCRNet-48) | GitHub Repo | Pretrained Model | See Example | (Cityscapes) mIOU | 86.31% | 86.21% | TBD |
Semantic Segmentation | FFNets | Github Repo | Prepared Models (5 in total) | See Example | mIoU Results | TBD | ||
Speech Recognition | DeepSpeech2 | GitHub Repo | Pretrained Model | See Example | (Librispeech Test Clean) WER | 9.92% | 10.22% | TBD |
NLP / NLU | Bert | Repo | Prepared Models | See Example | (GLUE dataset) GLUE score | 83.11 | 82.44 | TBD |
(SQuAD dataset) F1 score | 88.48 | 87.47 | TBD | |||||
Detailed Results | ||||||||
NLP / NLU | MobileBert | Repo | Prepared Models | See Example | (GLUE dataset) GLUE score | 81.24 | 81.17 | TBD |
(SQuAD dataset) F1 score | 89.45 | 88.66 | TBD | |||||
Detailed Results | ||||||||
NLP / NLU | MiniLM | Repo | Prepared Models | See Example | (GLUE dataset) GLUE score | 82.23 | 82.63 | TBD |
(SQuAD dataset) F1 score | 90.47 | 89.70 | TBD | |||||
Detailed Results | ||||||||
NLP / NLU | Roberta | Repo | Prepared Models | See Example | (GLUE dataset) GLUE score | 85.11 | 84.26 | TBD |
Detailed Results | ||||||||
NLP / NLU | DistilBert | Repo | Prepared Models | See Example | (GLUE dataset) GLUE score | 80.71 | 80.26 | TBD |
(SQuAD dataset) F1 score | 85.42 | 85.18 | TBD | |||||
Detailed Results |
[1] Model usage documentation
[2] Original FP32 model source
[3] FP32 model checkpoint
[4] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit weights are used to further improve performance of post-training quantization.
[5] Results comparing float and quantized performance
[6] W8A8 indicates 8-bit weights, 8-bit activations
[7] W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).
TBD indicates that support is NOT yet available
Task | Network [1] | Model Source [2] | Floating Pt (FP32) Model [3] | Quantized Model [4] | TensorFlow Version | Results [5] | |||
---|---|---|---|---|---|---|---|---|---|
Metric | FP32 | W8A8[6] | W4A8[7] | ||||||
Image Classification | ResNet-50 (v1) | GitHub Repo | Pretrained Model | See Documentation | 1.15 | (ImageNet) Top-1 Accuracy | 75.21% | 74.96% | TBD |
Image Classification | MobileNet-v2-1.4 | GitHub Repo | Pretrained Model | Quantized Model | 1.15 | (ImageNet) Top-1 Accuracy | 75% | 74.21% | TBD |
Image Classification | EfficientNet Lite | GitHub Repo | Pretrained Model | Quantized Model | 2.4 | (ImageNet) Top-1 Accuracy | 74.93% | 74.99% | TBD |
Object Detection | SSD MobileNet-v2 | GitHub Repo | Pretrained Model | See Example | 1.15 | (COCO) Mean Avg. Precision (mAP) | 0.2469 | 0.2456 | TBD |
Object Detection | RetinaNet | GitHub Repo | Pretrained Model | See Example | 1.15 | (COCO) mAP Detailed Results | 0.35 | 0.349 | TBD |
Object Detection | MobileDet-EdgeTPU | GitHub Repo | Pretrained Model | See Example | 2.4 | (COCO) Mean Avg. Precision (mAP) | 0.281 | 0.279 | TBD |
Pose Estimation | Pose Estimation | Based on Ref. | Based on Ref. | Quantized Model | 2.4 | (COCO) mAP | 0.383 | 0.379 | TBD |
(COCO) (mAR) | 0.452 | 0.446 | TBD | ||||||
Super Resolution | SRGAN | GitHub Repo | Pretrained Model | See Example | 2.4 | (BSD100) PSNR / SSIM Detailed Results | 25.45 / 0.668 | 24.78 / 0.628 | 25.41 / 0.666 (INT8W / INT16Act.) |
[1] Model usage documentation
[2] Original FP32 model source
[3] FP32 model checkpoint
[4] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit activations (INT8W/INT16Act.) are used to further improve performance of post-training quantization.
[5] Results comparing float and quantized performance
[6] W8A8 indicates 8-bit weights, 8-bit activations
[7] W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).
TBD indicates that support is NOT yet available
Before you can run the evaluation script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this Getting Started page for an overview. Then install AIMET and its dependencies using these Installation instructions.
The evaluation scripts run floating-point and quantized evaluations that demonstrate improved quantized model performance through the use of AIMET techniques. They generate and display the final accuracy results (as documented in the table above). To access the documentation and procedures for a specific model, refer to the relevant TensorFlow or PyTorch model folder.
AIMET Model Zoo is a project maintained by Qualcomm Innovation Center, Inc.
Please see the LICENSE file for details.