Add description for MobileNet V4

mlcommons · May 21, 2024 · 6e5afd6 · 6e5afd6
1 parent 92656e9
commit 6e5afd6
Show file tree

Hide file tree

Showing 3 changed files with 15 additions and 1 deletion.
diff --git a/flutter/lib/benchmark/info.dart b/flutter/lib/benchmark/info.dart
@@ -39,6 +39,18 @@ class BenchmarkInfo {
           detailsTitle: stringResources.benchInfoImageClassification,
           detailsContent: stringResources.benchInfoImageClassificationDesc,
         );
+      case (BenchmarkId.imageClassificationV2):
+        return BenchmarkLocalizationInfo(
+          name: stringResources.benchNameImageClassification,
+          detailsTitle: stringResources.benchInfoImageClassification,
+          detailsContent: stringResources.benchInfoImageClassificationV2Desc,
+        );
+      case (BenchmarkId.imageClassificationOfflineV2):
+        return BenchmarkLocalizationInfo(
+          name: stringResources.benchNameImageClassificationOffline,
+          detailsTitle: stringResources.benchInfoImageClassification,
+          detailsContent: stringResources.benchInfoImageClassificationV2Desc,
+        );
       case (BenchmarkId.objectDetection):
         return BenchmarkLocalizationInfo(
           name: stringResources.benchNameObjectDetection,

diff --git a/flutter/lib/l10n/app_en.arb b/flutter/lib/l10n/app_en.arb
@@ -120,6 +120,7 @@
   "benchInfoLanguageProcessing": "Language Processing",
   "benchInfoSuperResolution": "Super Resolution",
   "benchInfoImageClassificationDesc": "Image classification picks the best label to describe an input image and is commonly used for photo search and text extraction. The MobileNetEdgeTPU reference model is evaluated on the ImageNet 2012 validation dataset and requires a minimum accuracy of 74.66% (98% of FP32 accuracy of 76.19%) Top-1 accuracy (For Performance measurements, App uses a different dataset).\n\nThe MobileNetEdgeTPU network is a descendent of the MobileNet-v2 family that is optimized for low-latency and mobile accelerators. The MobileNetEdgeTPU model architecture is based on convolutional layers with inverted residuals and linear bottlenecks, similar to MobileNet v2, but is optimized by introducing fused inverted bottleneck convolutions to improve hardware utilization, and removing hard-swish and squeeze-and-excite blocks.\n\nThe offline variant of image classification has no latency constraints and typically uses batched inference and has higher throughput.",
+  "benchInfoImageClassificationV2Desc": "Image classification picks the best label to describe an input image and is commonly used for photo search and text extraction.\n\nThe MobileNetV4-Conv-L model boasts an impressive 83% accuracy with the ImageNet dataset, versus 76% accuracy for the prior standard, MobileNetEdgeTPU. MobileNetV4-Conv-L is designed to perform well across a range of mobile processor types, from CPUs and GPUs to neural accelerators. The MLPerf Mobile working group worked closely with the MobileNetV4 team in order to ensure optimized performance. This combination of an improved model architecture and collaborative optimization has proven quite potent. Although MobileNetV4-Conv-L executes six times the number of mathematical operations of its predecessor, MobileNetEdgeTPU, benchmark execution times have only increased by a factor of roughly 4.6.\n\nThe offline variant of image classification has no latency constraints and typically uses batched inference and has higher throughput.",
   "benchInfoObjectDetectionDesc": "Object detection draws bounding boxes around recognized objects in an input image, assigning each one a label. This is a common approach for identifying objects in photos, and automotive safety. Since v1.0, our reference model has been updated to MobileDets (from v0.7 model,  Single Shot Detector with a MobileNet-v2 feature extractor operating). MobileDets are trained on the COCO 2017 validation dataset. The MobileDets Object Detection task is evaluated on the COCO 2017 dataset with an input image resolution of 320x320. It requires a minimum  mean Average Precision (mAP) of 27.075 (95% of FP32 mAP of 28.5%), which is significantly higher than that of the previous model.\n\nMobileDets are searched for object detection. A key feature of MobileDets is that the search space includes both inverted bottleneck blocks and regular convolution operations to help improve the accuracy-latency trade-off on several hardware accelerators.",
   "benchInfoImageSegmentationDesc": "Semantic image segmentation partitions an input image into labeled objects at pixel granularity, and is used for complex image manipulation such as red-eye reduction as well as automotive and medical applications. The reference model is the MOSAIC network paired with a tailored feature extraction backbone. It operates on 512x512 resolution input images from the ADE20K validation set and requires a minimum mean Intersection Over Union (mIoU) value of 57.36% (96% of FP32 mIoU of 59.75%), significantly higher than the previous segmentation model (MobileNetv2-Deeplabv3+).\n\nMOSAIC employs a simple asymmetric encoder-decoder structure which consists of an efficient multi-scale context encoder and a light-weight hybrid decoder to recover spatial details from aggregated information with multiple lateral connections between the two. The feature extractor is a variant of MobileNet Multi-Hardware, which is a network built and optimized with neural architecture search. It is further enhanced for image segmentation by reducing the output stride, adding dilated convolutions at the end stage, and halving the feature channels.",
   "benchInfoLanguageProcessingDesc": "Question Answering finds the best answer to an input question based on a body of text, and is commonly employed in applications such as virtual assistants and chatbots. The reference model, MobileBERT, is evaluated on the Stanford Question Answering Dataset (SQUAD) v1.1 Dev-mini. The task requires a minimum F1-score of 87.4% (93% of FP32 F1-score of 93.08%).\n\nMobileBERT is a streamlined, mobile-optimized version of the larger BERT_LARGE network. It features bottleneck structures and a carefully designed balance between self-attention and feed-forward networks. While BERT is task-agnostic and can be applied to various downstream natural language processing tasks, the MobileBERT variant used in MLPerf is specifically fine-tuned for question answering.",

diff --git a/flutter/lib/ui/home/benchmark_info_button.dart b/flutter/lib/ui/home/benchmark_info_button.dart
@@ -29,7 +29,8 @@ class BenchmarkInfoButton extends StatelessWidget {
       isDismissible: false,
       enableDrag: false,
       isScrollControlled: true,
-      shape: RoundedRectangleBorder(borderRadius: BorderRadius.circular(30)),
+      shape: const RoundedRectangleBorder(
+          borderRadius: BorderRadius.vertical(top: Radius.circular(24))),
       builder: (context) => Wrap(
         children: [
           Padding(