This is a text spotting model that simultaneously detects and recognizes text. The model detects symbol sequences separated by space and performs recognition without a dictionary. The model is built on top of the Mask-RCNN framework with additional attention-based text recognition head.
Symbols set is alphanumeric: 0123456789abcdefghijklmnopqrstuvwxyz
.
This model is a Mask-RCNN-based text detector with ResNet50 backbone and additional text features output.
Metric | Value |
---|---|
Word spotting hmean ICDAR2015, without a dictionary | 59.04% |
Detection hmean ICDAR2015 | 87.09% |
GFlops | 185.169 |
MParams | 26.497 |
Source framework | PyTorch* |
Hmean Word spotting is defined and measured according to the Incidental Scene Text (ICDAR2015) challenge.
- Name:
im_data
, shape: [1x3x768x1280]. An input image in the [1xCxHxW] format. The expected channel order is BGR. - Name:
im_info
, shape: [1x3]. Image information: processed image height, processed image width and processed image scale with respect to the original image resolution.
- Name:
classes
, shape: [100]. Contiguous integer class ID for every detected object,0
for background (no object detected). - Name:
scores
, shape: [100]. Detection confidence scores in the [0, 1] range for every object. - Name:
boxes
, shape: [100x4]. Bounding boxes around every detected object in the (top_left_x, top_left_y, bottom_right_x, bottom_right_y) format. - Name:
raw_masks
, shape: [100x2x28x28]. Segmentation heatmaps for all classes for every output bounding box. - Name:
text_features
, shape [100x64x28x28]. Text features that are fed to a text recognition head.
[*] Other names and brands may be claimed as the property of others.