Model Zoo for AI Model Efficiency Toolkit

We provide a collection of popular neural network models and compare their floating point and quantized performance. Results demonstrate that quantized models can provide good accuracy, comparable to floating point models. Together with results, we also provide scripts and artifacts for users to quantize floating-point models using the AI Model Efficiency ToolKit (AIMET).

Introduction

Quantized inference is significantly faster than floating-point inference, and enables models to run in a power-efficient manner on mobile and edge devices. We use AIMET, a library that includes state-of-the-art techniques for quantization, to quantize various models available in PyTorch and TensorFlow frameworks.

An original FP32 source model is quantized either using post-training quantization (PTQ) or Quantization-Aware-Training (QAT) technique available in AIMET. Example scripts for evaluation are provided for each model. When PTQ is needed, the evaluation script performs PTQ before evaluation. Wherever QAT is used, the fine-tuned model checkpoint is also provided.

PyTorch Models

Task	Network^[1]	Model Source^[2]	Floating Pt (FP32) Model ^[3]	Quantized Model ^[4]	Results ^[5]
					Metric	FP32	W8A8^[6]	W4A8^[7]
Image Classification	MobileNetV2	GitHub Repo	Pretrained Model	Quantized Model	(ImageNet) Top-1 Accuracy	71.67%	71.14%	TBD
Image Classification	Resnet18	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	69.75%	69.54%	69.1%
Image Classification	Resnet50	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	76.14%	75.81%	75.63%
Image Classification	Regnet_x_3_2gf	Pytorch Torchvision	Pytorch Torchvision	Quantized Model	(ImageNet) Top-1 Accuracy	78.36%	78.10%	77.70%
Image Classification	EfficientNet-lite0	GitHub Repo	Pretrained Model	Quantized Model	(ImageNet) Top-1 Accuracy	75.40%	75.36%	74.46%
Image Classification	ViT	Repo	Prepared Models	See Example	(ImageNet dataset) Accuracy	81.32	81.57	TBD
Image Classification	MobileViT	Repo	Prepared Models	See Example	(ImageNet dataset) Accuracy	78.46	77.59	TBD
Object Detection	MobileNetV2-SSD-Lite	GitHub Repo	Pretrained Model	Quantized Model	(PascalVOC) mAP	68.7%	68.6%	TBD
Pose Estimation	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	(COCO) mAP	0.364	0.359	TBD
Pose Estimation	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	(COCO) mAR	0.436	0.432	TBD
Pose Estimation	HRNET-Posenet	Based on Ref.	FP32 Model	Quantized Model	(COCO) mAP	0.765	0.763	0.762
Pose Estimation	HRNET-Posenet	Based on Ref.	FP32 Model	Quantized Model	(COCO) mAR	0.793	0.792	0.791
Super Resolution	SRGAN	GitHub Repo	Pretrained Model (older version from here)	See Example	(BSD100) PSNR / SSIM Detailed Results	25.51 / 0.653	25.5 / 0.648	TBD
Super Resolution	Anchor-based Plain Net (ABPN)	Based on Ref.	See Tarballs	See Example	Average PSNR Results			TBD
Super Resolution	Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution (XLSR)	Based on Ref.	See Tarballs	See Example	Average PSNR Results			TBD
Super Resolution	Super-Efficient Super Resolution (SESR)	Based on Ref.	See Tarballs	See Example	Average PSNR Results			TBD
Super Resolution	QuickSRNet	-	See Tarballs	See Example	Average PSNR Results			TBD
Semantic Segmentation	DeepLabV3+	GitHub Repo	Pretrained Model	Quantized Model	(PascalVOC) mIOU	72.91%	72.44%	72.18%
Semantic Segmentation	HRNet-W48	GitHub Repo	Original model weight not available	See Example	(Cityscapes) mIOU	81.04%	80.65%	80.07%
Semantic Segmentation	InverseForm (HRNet-16-Slim-IF)	GitHub Repo	Pretrained Model	See Example	(Cityscapes) mIOU	77.81%	77.17%	TBD
Semantic Segmentation	InverseForm (OCRNet-48)	GitHub Repo	Pretrained Model	See Example	(Cityscapes) mIOU	86.31%	86.21%	TBD
Semantic Segmentation	FFNets	Github Repo	Prepared Models (5 in total)	See Example	mIoU Results			TBD
Speech Recognition	DeepSpeech2	GitHub Repo	Pretrained Model	See Example	(Librispeech Test Clean) WER	9.92%	10.22%	TBD
NLP / NLU	Bert	Repo	Prepared Models	See Example	(GLUE dataset) GLUE score	83.11	82.44	TBD
					(SQuAD dataset) F1 score	88.48	87.47	TBD
					Detailed Results
NLP / NLU	MobileBert	Repo	Prepared Models	See Example	(GLUE dataset) GLUE score	81.24	81.17	TBD
					(SQuAD dataset) F1 score	89.45	88.66	TBD
					Detailed Results
NLP / NLU	MiniLM	Repo	Prepared Models	See Example	(GLUE dataset) GLUE score	82.23	82.63	TBD
					(SQuAD dataset) F1 score	90.47	89.70	TBD
					Detailed Results
NLP / NLU	Roberta	Repo	Prepared Models	See Example	(GLUE dataset) GLUE score	85.11	84.26	TBD
NLP / NLU	Roberta	Repo	Prepared Models	See Example	Detailed Results
NLP / NLU	DistilBert	Repo	Prepared Models	See Example	(GLUE dataset) GLUE score	80.71	80.26	TBD
					(SQuAD dataset) F1 score	85.42	85.18	TBD
					Detailed Results

^[1] _{Model usage documentation}
^[2] _{Original FP32 model source}
^[3] _{FP32 model checkpoint}
^[4] _{Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit weights are used to further improve performance of post-training quantization.}
^[5] _{Results comparing float and quantized performance}
^[6] _{W8A8 indicates 8-bit weights, 8-bit activations}
^[7] _{W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).}
_{TBD indicates that support is NOT yet available}

Tensorflow Models

Task	Network ^[1]	Model Source ^[2]	Floating Pt (FP32) Model ^[3]	Quantized Model ^[4]	TensorFlow Version	Results ^[5]
						Metric	FP32	W8A8^[6]	W4A8^[7]
Image Classification	ResNet-50 (v1)	GitHub Repo	Pretrained Model	See Documentation	1.15	(ImageNet) Top-1 Accuracy	75.21%	74.96%	TBD
Image Classification	MobileNet-v2-1.4	GitHub Repo	Pretrained Model	Quantized Model	1.15	(ImageNet) Top-1 Accuracy	75%	74.21%	TBD
Image Classification	EfficientNet Lite	GitHub Repo	Pretrained Model	Quantized Model	2.4	(ImageNet) Top-1 Accuracy	74.93%	74.99%	TBD
Object Detection	SSD MobileNet-v2	GitHub Repo	Pretrained Model	See Example	1.15	(COCO) Mean Avg. Precision (mAP)	0.2469	0.2456	TBD
Object Detection	RetinaNet	GitHub Repo	Pretrained Model	See Example	1.15	(COCO) mAP Detailed Results	0.35	0.349	TBD
Object Detection	MobileDet-EdgeTPU	GitHub Repo	Pretrained Model	See Example	2.4	(COCO) Mean Avg. Precision (mAP)	0.281	0.279	TBD
Pose Estimation	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	2.4	(COCO) mAP	0.383	0.379	TBD
Pose Estimation	Pose Estimation	Based on Ref.	Based on Ref.	Quantized Model	2.4	(COCO) (mAR)	0.452	0.446	TBD
Super Resolution	SRGAN	GitHub Repo	Pretrained Model	See Example	2.4	(BSD100) PSNR / SSIM Detailed Results	25.45 / 0.668	24.78 / 0.628	25.41 / 0.666 (INT8W / INT16Act.)

^[1] _{Model usage documentation}
^[2] _{Original FP32 model source}
^[3] _{FP32 model checkpoint}
^[4] _{Quantized Model: For models quantized with post-training technique, refers to FP32 model which can then be quantized using AIMET. For models optimized with QAT, refers to model checkpoint with fine-tuned weights. 8-bit weights and activations are typically used. For some models, 8-bit weights and 16-bit activations (INT8W/INT16Act.) are used to further improve performance of post-training quantization.}
^[5] _{Results comparing float and quantized performance}
^[6] _{W8A8 indicates 8-bit weights, 8-bit activations}
^[7] _{W4A8 indicates 4-bit weights, 8-bit activations (Some models include a mix of W4A8 and W8A8 layers).}
_{TBD indicates that support is NOT yet available}

Usage

Install AIMET

Before you can run the evaluation script for a specific model, you need to install the AI Model Efficiency ToolKit (AIMET) software. Please see this Getting Started page for an overview. Then install AIMET and its dependencies using these Installation instructions.

Run model evaluation

The evaluation scripts run floating-point and quantized evaluations that demonstrate improved quantized model performance through the use of AIMET techniques. They generate and display the final accuracy results (as documented in the table above). To access the documentation and procedures for a specific model, refer to the relevant TensorFlow or PyTorch model folder.

Team

AIMET Model Zoo is a project maintained by Qualcomm Innovation Center, Inc.

License

Please see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.githooks		.githooks
images		images
zoo_tensorflow		zoo_tensorflow
zoo_torch		zoo_torch
.gitignore		.gitignore
.omniscanignore		.omniscanignore
.pylintrc		.pylintrc
LICENSE.pdf		LICENSE.pdf
NOTICE.txt		NOTICE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Zoo for AI Model Efficiency Toolkit

Table of Contents

Introduction

PyTorch Models

Tensorflow Models

Usage

Install AIMET

Run model evaluation

Team

License

About

Releases

Packages

Languages

yinwangsong/aimet-model-zoo

Folders and files

Latest commit

History

Repository files navigation

Model Zoo for AI Model Efficiency Toolkit

Table of Contents

Introduction

PyTorch Models

Tensorflow Models

Usage

Install AIMET

Run model evaluation

Team

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages