OpenVINO™ toolkit provides a set of public models that you can use for learning and demo purposes or for developing deep learning software. Most recent version is available in the repo on Github.
The models can be downloaded via Model Downloader
(<OPENVINO_INSTALL_DIR>/deployment_tools/open_model_zoo/tools/downloader
).
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
AlexNet | Caffe* | alexnet | 1.5 | 60.965 | |
CaffeNet | Caffe* | caffenet | 1.5 | 60.965 | |
DenseNet 121 | Caffe* TensorFlow* Caffe2* |
densenet-121 densenet-121-tf densenet-121-caffe2 |
5.289~5.724 | 7.971 | |
DenseNet 161 | Caffe* TensorFlow* |
densenet-161 densenet-161-tf |
14.128~15.561 | 28.666 | |
DenseNet 169 | Caffe* TensorFlow* |
densenet-169 densenet-169-tf |
6.16~6.788 | 14.139 | |
DenseNet 201 | Caffe* | densenet-201 | 8.673 | 20.001 | |
EfficientNet B0 | TensorFlow* PyTorch* |
efficientnet-b0 efficientnet-b0-pytorch |
75.70/92.76 76.91/93.21 |
0.819 | 5.268 |
EfficientNet B0 AutoAugment | TensorFlow* | efficientnet-b0_auto_aug | 76.43/93.04 | 0.819 | 5.268 |
EfficientNet B5 | TensorFlow* PyTorch* |
efficientnet-b5 efficientnet-b5-pytorch |
83.33/96.67 83.69/96.71 |
21.252 | 30.303 |
EfficientNet B7 | PyTorch* | efficientnet-b7-pytorch | 84.42/96.91 | 77.618 | 66.193 |
EfficientNet B7 AutoAugment | TensorFlow* | efficientnet-b7_auto_aug | 84.68/97.09 | 77.618 | 66.193 |
Inception (GoogleNet) V1 | Caffe* | googlenet-v1 | 3.266 | 6.999 | |
Inception (GoogleNet) V2 | Caffe* | googlenet-v2 | 4.058 | 11.185 | |
Inception (GoogleNet) V3 | Caffe* PyTorch* |
googlenet-v3 googlenet-v3-pytorch |
11.469 | 23.817 | |
Inception (GoogleNet) V4 | Caffe* | googlenet-v4 | 24.584 | 42.648 | |
Inception-ResNet V2 | Caffe* TensorFlow* |
inception-resnet-v2 inception-resnet-v2-tf |
22.227~26.405 | 30.223~55.813 | |
MobileNet V1 0.25 128 | Caffe* | mobilenet-v1-0.25-128 | 0.028 | 0.468 | |
MobileNet V1 0.5 160 | Caffe* | mobilenet-v1-0.50-160 | 0.156 | 1.327 | |
MobileNet V1 0.5 224 | Caffe* | mobilenet-v1-0.50-224 | 0.304 | 1.327 | |
MobileNet V1 1.0 224 | Caffe* TensorFlow* |
mobilenet-v1-1.0-224 mobilenet-v1-1.0-224-tf |
1.148 | 4.221 | |
MobileNet V2 1.0 224 | Caffe* TensorFlow* PyTorch* |
mobilenet-v2 mobilenet-v2-1.0-224 mobilenet-v2-pytorch |
0.615~0.876 | 3.489 | |
MobileNet V2 1.4 224 | TensorFlow* | mobilenet-v2-1.4-224 | 1.183 | 6.087 | |
ResNet 50 | Caffe* PyTorch* Caffe2* |
resnet-50 resnet-50-pytorch resnet-50-caffe2 |
6.996~8.216 | 25.53 | |
ResNet 101 | Caffe* | resnet-101 | 14.441 | 44.496 | |
ResNet 152 | Caffe* | resnet-152 | 21.89 | 60.117 | |
SE-Inception | Caffe* | se-inception | 4.091 | 11.922 | |
SE-ResNet 50 | Caffe* | se-resnet-50 | 7.775 | 28.061 | |
SE-ResNet 101 | Caffe* | se-resnet-101 | 15.239 | 49.274 | |
SE-ResNet 152 | Caffe* | se-resnet-152 | 22.709 | 66.746 | |
SE-ResNeXt 50 | Caffe* | se-resnext-50 | 8.533 | 27.526 | |
SE-ResNeXt 101 | Caffe* | se-resnext-101 | 16.054 | 48.886 | |
SqueezeNet v1.0 | Caffe* | squeezenet1.0 | 1.737 | 1.248 | |
SqueezeNet v1.1 | Caffe* Caffe2* |
squeezenet1.1 squeezenet1.1-caffe2 |
0.785 | 1.236 | |
VGG 16 | Caffe* | vgg16 | 30.974 | 138.358 | |
VGG 19 | Caffe* Caffe2* |
vgg19 vgg19-caffe2 |
39.3 | 143.667 |
Octave Convolutions Networks
This is are modifications of networks using Octave Convolutions. More details can be found here.
Model Name | Implementation | OMZ Model Name | Accuracy | GFlops | mParams |
---|---|---|---|---|---|
DenseNet 121, alpha=0.125 | MXNet* | octave-densenet-121-0.125 | 4.883 | 7.977 | |
ResNet 26, alpha=0.25 | MXNet* | octave-resnet-26-0.25 | 3.768 | 15.99 | |
ResNet 50, alpha=0.125 | MXNet* | octave-resnet-50-0.125 | 7.221 | 25.551 | |
ResNet 101, alpha=0.125 | MXNet* | octave-resnet-101-0.125 | 13.387 | 44.543 | |
ResNet 200, alpha=0.125 | MXNet* | octave-resnet-200-0.125 | 25.407 | 64.667 | |
ResNeXt 50, alpha=0.25 | MXNet* | octave-resnext-50-0.25 | 6.444 | 25.02 | |
ResNeXt 101, alpha=0.25 | MXNet* | octave-resnext-101-0.25 | 11.521 | 44.169 | |
SE-ResNet 50, alpha=0.125 | MXNet* | octave-se-resnet-50-0.125 | 7.246 | 28.082 |
Semantic segmentation is an extension of object detection problem. Instead of returning bounding boxes, semantic segmentation models return a "painted" version of the input image, where the "color" of each pixel represents a certain class. These networks are much bigger than respective object detection networks, but they provide a better (pixel-level) localization of objects and they can detect areas with complex shape.
Model Name | Implementation | OMZ Model Name | GFlops | mParams |
---|---|---|---|---|
DeepLab V3 | TensorFlow* | deeplabv3 | 11.469 | 23.819 |
Instance segmentation is an extension of object detection and semantic segmentation problems. Instead of predicting a bounding box around each object instance instance segmentation model outputs pixel-wise masks for all instances.
Model Name | Implementation | OMZ Model Name | GFlops | mParams |
---|---|---|---|---|
Mask R-CNN Inception ResNet V2 | TensorFlow* | mask_rcnn_inception_resnet_v2_atrous_coco | 675.314 | 92.368 |
Mask R-CNN Inception V2 | TensorFlow* | mask_rcnn_inception_v2_coco | 54.926 | 21.772 |
Mask R-CNN ResNet 50 | TensorFlow* | mask_rcnn_resnet50_atrous_coco | 294.738 | 50.222 |
Mask R-CNN ResNet 101 | TensorFlow* | mask_rcnn_resnet101_atrous_coco | 674.58 | 69.188 |
Model Name | Implementation | OMZ Model Name | GFlops | mParams |
---|---|---|---|---|
Brain Tumor Segmentation | MXNet* | brain-tumor-segmentation-0001 | 409.996 | 38.192 |
Several detection models can be used to detect a set of the most popular objects - for example, faces, people, vehicles. Most of the networks are SSD-based and provide reasonable accuracy/performance trade-offs.
Model Name | Implementation | OMZ Model Name | GFlops | mParams |
---|---|---|---|---|
CTPN | TensorFlow* | ctpn | 55.813 | 17.237 |
CenterNet (CTDET with DLAV0) 384x384 | ONNX* | ctdet_coco_dlav0_384 | 34.994 | 17.911 |
CenterNet (CTDET with DLAV0) 512x512 | ONNX* | ctdet_coco_dlav0_512 | 62.211 | 17.911 |
Faster R-CNN with Inception-ResNet v2 | TensorFlow* | faster_rcnn_inception_resnet_v2_atrous_coco | 30.687 | 13.307 |
Faster R-CNN with Inception v2 | TensorFlow* | faster_rcnn_inception_v2_coco | 30.687 | 13.307 |
Faster R-CNN with ResNet 50 | TensorFlow* | faster_rcnn_resnet50_coco | 57.203 | 29.162 |
Faster R-CNN with ResNet 101 | TensorFlow* | faster_rcnn_resnet101_coco | 112.052 | 48.128 |
MTCNN | Caffe*: proposal refine output |
mtcnn-p mtcnn-r mtcnn-o |
||
SSD 300 | Caffe* | ssd300 | 62.815 | 26.285 |
SSD 512 | Caffe* | ssd512 | 180.611 | 27.189 |
SSD with MobileNet | Caffe* TensorFlow* |
mobilenet-ssd ssd_mobilenet_v1_coco |
2.316~2.494 | 5.783~6.807 |
SSD with MobileNet FPN | TensorFlow* | ssd_mobilenet_v1_fpn_coco | 123.309 | 36.188 |
SSD with MobileNet V2 | TensorFlow* | ssd_mobilenet_v2_coco | 3.775 | 16.818 |
SSD lite with MobileNet V2 | TensorFlow* | ssdlite_mobilenet_v2 | 1.525 | 4.475 |
Model Name | Implementation | OMZ Model Name | GFlops | mParams |
---|---|---|---|---|
FaceNet | TensorFlow* | facenet-20180408-102900 | 2.846 | 23.469 |
LResNet34E-IR,ArcFace@ms1m-refine-v1 | MXNet* | face-recognition-resnet34-arcface | 8.934 | 34.129 |
LResNet50E-IR,ArcFace@ms1m-refine-v1 | MXNet* | face-recognition-resnet50-arcface | 12.637 | 43.576 |
LResNet100E-IR,ArcFace@ms1m-refine-v2 | MXNet* | face-recognition-resnet100-arcface | 24.209 | 65.131 |
MobileFaceNet,ArcFace@ms1m-refine-v1 | MXNet* | face-recognition-mobilefacenet-arcface | 0.449 | 0.993 |
SphereFace | Caffe* | Sphereface | 3.504 | 22.671 |
Human pose estimation task is to predict a pose: body skeleton, which consists of keypoints and connections between them, for every person in an input image or video. Keypoints are body joints, i.e. ears, eyes, nose, shoulders, knees, etc. There are two major groups of such metods: top-down and bottom-up. The first detects persons in a given frame, crops or rescales detections, then runs pose estimation network for every detection. These methods are very accurate. The second finds all keypoints in a given frame, then groups them by person instances, thus faster than previous, because network runs once.
Model Name | Implementation | OMZ Model Name | GFlops | mParams |
---|---|---|---|---|
human-pose-estimation-3d-0001 | PyTorch* | human-pose-estimation-3d-0001 | 18.998 | 5.074 |
single-human-pose-estimation-0001 | PyTorch* | single-human-pose-estimation-0001 | 60.125 | 33.165 |
[*] Other names and brands may be claimed as the property of others.