GluonCV 0.6.0 Release
GluonCV 0.6.0 Release
Highlights
GluonCV v0.6.0 added more video classification models, added pose estimation models that are suitable for mobile inference, added quantized models for video classification and pose estimation, and we also included multiple usability and code improvements.
More video action recognition models
https://gluon-cv.mxnet.io/model_zoo/action_recognition.html
We now provide state-of-the-art video classification networks, such as I3D, I3D-Nonlocal and SlowFast. We have a complete model zoo over several widely adopted video datasets. We provide a general video dataloader (which can handle both frame format and raw video format). Users can do training, fine-tuning, prediction and feature extraction without writing complicate code. Just prepare a text file containing the video information is enough.
Below is the table of new models included in this release.
Name | Pretrained | Segments | Clip Length | Top-1 | Hashtag |
---|---|---|---|---|---|
inceptionv1_kinetics400 | ImageNet | 7 | 1 | 69.1 | 6dcdafb1 |
inceptionv3_kinetics400 | ImageNet | 7 | 1 | 72.5 | 8a4a6946 |
resnet18_v1b_kinetics400 | ImageNet | 7 | 1 | 65.5 | 46d5a985 |
resnet34_v1b_kinetics400 | ImageNet | 7 | 1 | 69.1 | 8a8d0d8d |
resnet50_v1b_kinetics400 | ImageNet | 7 | 1 | 69.9 | cc757e5c |
resnet101_v1b_kinetics400 | ImageNet | 7 | 1 | 71.3 | 5bb6098e |
resnet152_v1b_kinetics400 | ImageNet | 7 | 1 | 71.5 | 9bc70c66 |
i3d_inceptionv1_kinetics400 | ImageNet | 1 | 32 (64/2) | 71.8 | 81e0be10 |
i3d_inceptionv3_kinetics400 | ImageNet | 1 | 32 (64/2) | 73.6 | f14f8a99 |
i3d_resnet50_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 74.0 | 568a722e |
i3d_resnet101_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 75.1 | 6b69f655 |
i3d_nl5_resnet50_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 75.2 | 3c0e47ea |
i3d_nl10_resnet50_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 75.3 | bfb58c41 |
i3d_nl5_resnet101_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 76.0 | fbfc1d30 |
i3d_nl10_resnet101_v1_kinetics400 | ImageNet | 1 | 32 (64/2) | 76.1 | 59186c31 |
slowfast_4x16_resnet50_kinetics400 | ImageNet | 1 | 36 (64/1) | 75.3 | 9d650f51 |
slowfast_8x8_resnet50_kinetics400 | ImageNet | 1 | 40 (64/1) | 76.6 | d6b25339 |
slowfast_8x8_resnet101_kinetics400 | ImageNet | 1 | 40 (64/1) | 77.2 | fbde1a7c |
resnet50_v1b_ucf101 | ImageNet | 3 | 1 | 83.7 | d728ecc7 |
i3d_resnet50_v1_ucf101 | ImageNet | 1 | 32 (64/2) | 83.9 | 7afc7286 |
i3d_resnet50_v1_ucf101 | Kinetics400 | 1 | 32 (64/2) | 95.4 | 760d0981 |
resnet50_v1b_hmdb51 | ImageNet | 3 | 1 | 55.2 | 682591e2 |
i3d_resnet50_v1_hmdb51 | ImageNet | 1 | 32 (64/2) | 48.5 | 0d0ad559 |
i3d_resnet50_v1_hmdb51 | Kinetics400 | 1 | 32 (64/2) | 70.9 | 2ec6bf01 |
resnet50_v1b_sthsthv2 | ImageNet | 8 | 1 | 35.5 | 80ee0c6b |
i3d_resnet50_v1_sthsthv2 | ImageNet | 1 | 16 (32/2) | 50.6 | 01961e4c |
We include tutorials for how to fine-tune a pre-trained model on users' own dataset.
https://gluon-cv.mxnet.io/build/examples_action_recognition/finetune_custom.html
We include tutorials for introducing a new efficient video reader, Decord.
https://gluon-cv.mxnet.io/build/examples_action_recognition/decord_loader.html
We include tutorials for how to extract features from a pre-trained model.
https://gluon-cv.mxnet.io/build/examples_action_recognition/feat_custom.html
We include tutorials for how to make predictions from a pre-trained model.
https://gluon-cv.mxnet.io/build/examples_action_recognition/demo_custom.html
We include tutorials for how to perform distributed training on deep video models.
https://gluon-cv.mxnet.io/build/examples_distributed/distributed_slowfast.html
We include tutorials for how to prepare HMDB51 and Something-something-v2 dataset.
https://gluon-cv.mxnet.io/build/examples_datasets/hmdb51.html
https://gluon-cv.mxnet.io/build/examples_datasets/somethingsomethingv2.html
We will provide Kinetics600 and Kinetics700 pre-trained models in the next release, please stay tuned.
Mobile pose estimation models
https://gluon-cv.mxnet.io/model_zoo/pose.html#mobile-pose-models
Model | OKS AP | OKS AP (with flip) | Hashtag |
---|---|---|---|
mobile_pose_resnet18_v1b | 66.2/89.2/74.3 | 67.9/90.3/75.7 | dd6644eb |
mobile_pose_resnet50_v1b | 71.1/91.3/78.7 | 72.4/92.3/79.8 | ec8809df |
mobile_pose_mobilenet1.0 | 64.1/88.1/71.2 | 65.7/89.2/73.4 | b399bac7 |
mobile_pose_mobilenetv2_1.0 | 63.7/88.1/71.0 | 65.0/89.2/72.3 | 4acdc130 |
mobile_pose_mobilenetv3_large | 63.7/88.9/70.8 | 64.5/89.0/72.0 | 1ca004dc |
mobile_pose_mobilenetv3_small | 54.3/83.7/59.4 | 55.6/84.7/61.7 | b1b148a9 |
By replacing the backbone network, and use pixel shuffle layer instead of deconvolution, we can have models that are very fast. These models are suitable for edge device applications, tutorials on deployment will come soon.
More Int8 quantized models
https://gluon-cv.mxnet.io/build/examples_deployment/int8_inference.html
Below CPU performance is benchmarked on AWS EC2 C5.12xlarge instance with 24 physical cores.
Note that you will need nightly build of MXNet to properly use these new features.
Model | Dataset | Batch Size | Speedup (INT8/FP32) | FP32 Accuracy | INT8 Accuracy |
---|---|---|---|---|---|
simple_pose_resnet18_v1b | COCO Keypoint | 128 | 2.55 | 66.3 | 65.9 |
simple_pose_resnet50_v1b | COCO Keypoint | 128 | 3.50 | 71.0 | 70.6 |
simple_pose_resnet50_v1d | COCO Keypoint | 128 | 5.89 | 71.6 | 71.4 |
simple_pose_resnet101_v1b | COCO Keypoint | 128 | 4.07 | 72.4 | 72.2 |
simple_pose_resnet101_v1d | COCO Keypoint | 128 | 5.97 | 73.0 | 72.7 |
vgg16_ucf101 | UCF101 | 64 | 4.46 | 81.86 | 81.41 |
inceptionv3_ucf101 | UCF101 | 64 | 5.16 | 86.92 | 86.55 |
resnet18_v1b_kinetics400 | Kinetics400 | 64 | 5.24 | 63.29 | 63.14 |
resnet50_v1b_kinetics400 | Kinetics400 | 64 | 6.78 | 68.08 | 68.15 |
inceptionv3_kinetics400 | Kinetics400 | 64 | 5.29 | 67.93 | 67.92 |
For pose-estimation models, the accuracy metric is OKS AP w/o flip. Quantized 2D video action recognition models are calibrated with num-segments=3 (7 is for ResNet-based models).
Bug fixes and Improvements
- Performance of PSPNet using ResNet101 as backbone on Cityscapes (semantic segmentation) is improved from mIoU 77.1% to 79.9%, higher than the number reported in original paper.
- We will deprecate Python2 support in the next release.