Skip to content

yinwangsong/aimet-model-zoo

 
 

Repository files navigation

Qualcomm Innovation Center, Inc.

Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance. Results demonstrate that quantized models can provide good accuracy, comparable to floating point models. Together with results, we also provide scripts and artifacts for users to quantize floating-point models using the AI Model Efficiency ToolKit (AIMET).

Table of Contents

Introduction

Quantized inference is significantly faster than floating-point inference, and enables models to run in a power-efficient manner on mobile and edge devices. We use AIMET, a library that includes state-of-the-art techniques for quantization, to quantize various models available in PyTorch and TensorFlow frameworks.

An original FP32 source model is quantized either using post-training quantization (PTQ) or Quantization-Aware-Training (QAT) technique available in AIMET. Example scripts for evaluation are provided for each model. When PTQ is needed, the evaluation script performs PTQ before evaluation. Wherever QAT is used, the fine-tuned model checkpoint is also provided.

PyTorch Models

Task Network[1] Model Source[2] Floating Pt (FP32) Model [3] Quantized Model [4] Results [5]
Metric FP32 W8A8[6] W4A8[7]
Image Classification MobileNetV2 GitHub Repo Pretrained Model Quantized Model (ImageNet) Top-1 Accuracy 71.67% 71.14% TBD
Image Classification Resnet18 Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 69.75% 69.54% 69.1%
Image Classification Resnet50 Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 76.14% 75.81% 75.63%
Image Classification Regnet_x_3_2gf Pytorch Torchvision Pytorch Torchvision Quantized Model (ImageNet) Top-1 Accuracy 78.36% 78.10% 77.70%
Image Classification EfficientNet-lite0 GitHub Repo Pretrained Model Quantized Model (ImageNet) Top-1 Accuracy 75.40% 75.36% 74.46%
Image Classification ViT Repo Prepared Models See Example (ImageNet dataset) Accuracy 81.32 81.57 TBD
Image Classification MobileViT Repo Prepared Models See Example (ImageNet dataset) Accuracy 78.46 77.59 TBD
Object Detection MobileNetV2-SSD-Lite GitHub Repo Pretrained Model Quantized Model (PascalVOC) mAP 68.7% 68.6% TBD
Pose Estimation Pose Estimation Based on Ref. Based on Ref. Quantized Model (COCO) mAP 0.364 0.359 TBD
(COCO) mAR 0.436 0.432 TBD
Pose Estimation HRNET-Posenet Based on Ref. FP32 Model Quantized Model (COCO) mAP 0.765 0.763 0.762
(COCO) mAR 0.793 0.792 0.791
Super Resolution SRGAN GitHub Repo Pretrained Model (older version from here) See Example (BSD100) PSNR / SSIM Detailed Results 25.51 / 0.653 25.5 / 0.648 TBD
Super Resolution Anchor-based Plain Net (ABPN) Based on Ref. See Tarballs See Example Average PSNR Results TBD
Super Resolution Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution (XLSR) Based on Ref. See Tarballs See Example Average PSNR Results TBD
Super Resolution Super-Efficient Super Resolution (SESR) Based on Ref. See Tarballs See Example Average PSNR Results TBD
Super Resolution QuickSRNet - See Tarballs See Example Average PSNR Results TBD
Semantic Segmentation DeepLabV3+ GitHub Repo Pretrained Model Quantized Model (PascalVOC) mIOU 72.91% 72.44% 72.18%
Semantic Segmentation HRNet-W48 GitHub Repo Original model weight not available See Example (Cityscapes) mIOU 81.04% 80.65% 80.07%
Semantic Segmentation InverseForm (HRNet-16-Slim-IF) GitHub Repo Pretrained Model See Example (Cityscapes) mIOU 77.81% 77.17% TBD
Semantic Segmentation InverseForm (OCRNet-48) GitHub Repo Pretrained Model See Example (Cityscapes) mIOU 86.31% 86.21% TBD
Semantic Segmentation FFNets Github Repo Prepared Models (5 in total) See Example mIoU Results TBD
Speech Recognition DeepSpeech2 GitHub Repo Pretrained Model See Example (Librispeech Test Clean) WER 9.92% 10.22% TBD
NLP / NLU Bert Repo Prepared Models See Example (GLUE dataset) GLUE score 83.11 82.44 TBD
(SQuAD dataset) F1 score 88.48 87.47 TBD
Detailed Results
NLP / NLU MobileBert Repo Prepared Models See Example (GLUE dataset) GLUE score 81.24 81.17 TBD
(SQuAD dataset) F1 score 89.45 88.66 TBD
Detailed Results
NLP / NLU MiniLM Repo Prepared Models See Example (GLUE dataset) GLUE score 82.23 82.63 TBD
(SQuAD dataset) F1 score 90.47 89.70 TBD
Detailed Results
NLP / NLU Roberta Repo Prepared Models See Example (GLUE dataset) GLUE score 85.11 84.26 TBD
Detailed Results
NLP / NLU DistilBert Repo Prepared Models See Example (GLUE dataset) GLUE score 80.71 80.26 TBD
(SQuAD dataset) F1 score 85.42 85.18 TBD
Detailed Results

[1] Model usage documentation
[2] Original FP32 model source
[3] FP32 model checkpoint
[4] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit weights are used to further improve performance of post-training quantization.
[5] Results comparing float and quantized performance
[6] W8A8 indicates 8-bit weights, 8-bit activations
[7] W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).
TBD indicates that support is NOT yet available

Tensorflow Models

Task Network [1] Model Source [2] Floating Pt (FP32) Model [3] Quantized Model [4] TensorFlow Version Results [5]
Metric FP32 W8A8[6] W4A8[7]
Image Classification ResNet-50 (v1) GitHub Repo Pretrained Model See Documentation 1.15 (ImageNet) Top-1 Accuracy 75.21% 74.96% TBD
Image Classification MobileNet-v2-1.4 GitHub Repo Pretrained Model Quantized Model 1.15 (ImageNet) Top-1 Accuracy 75% 74.21% TBD
Image Classification EfficientNet Lite GitHub Repo Pretrained Model Quantized Model 2.4 (ImageNet) Top-1 Accuracy 74.93% 74.99% TBD
Object Detection SSD MobileNet-v2 GitHub Repo Pretrained Model See Example 1.15 (COCO) Mean Avg. Precision (mAP) 0.2469 0.2456 TBD
Object Detection RetinaNet GitHub Repo Pretrained Model See Example 1.15 (COCO) mAP Detailed Results 0.35 0.349 TBD
Object Detection MobileDet-EdgeTPU GitHub Repo Pretrained Model See Example 2.4 (COCO) Mean Avg. Precision (mAP) 0.281 0.279 TBD
Pose Estimation Pose Estimation Based on Ref. Based on Ref. Quantized Model 2.4 (COCO) mAP 0.383 0.379 TBD
(COCO) (mAR) 0.452 0.446 TBD
Super Resolution SRGAN GitHub Repo Pretrained Model See Example 2.4 (BSD100) PSNR / SSIM Detailed Results 25.45 / 0.668 24.78 / 0.628 25.41 / 0.666 (INT8W / INT16Act.)

[1] Model usage documentation
[2] Original FP32 model source
[3] FP32 model checkpoint
[4] Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit activations (INT8W/INT16Act.) are used to further improve performance of post-training quantization.
[5] Results comparing float and quantized performance
[6] W8A8 indicates 8-bit weights, 8-bit activations
[7] W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).
TBD indicates that support is NOT yet available

Usage

Install AIMET

Before you can run the evaluation script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this Getting Started page for an overview. Then install AIMET and its dependencies using these Installation instructions.

Run model evaluation

The evaluation scripts run floating-point and quantized evaluations that demonstrate improved quantized model performance through the use of AIMET techniques. They generate and display the final accuracy results (as documented in the table above). To access the documentation and procedures for a specific model, refer to the relevant TensorFlow or PyTorch model folder.

Team

AIMET Model Zoo is a project maintained by Qualcomm Innovation Center, Inc.

License

Please see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.1%
  • Jupyter Notebook 1.4%
  • Shell 0.5%