-
Text Detection
- (DBNet++) Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion | 2022 | Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai
- (FCENet) Fourier Contour Embedding for Arbitrary-Shaped Text Detection | 2021 | Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, Wayne Zhang
- ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection | 2020 | Yuxin Wang, Hongtao Xie, Zhengjun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang
- (DRGG) Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection | 2020 | Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin
- (DBNet) Real-time Scene Text Detection with Differentiable Binarization | 2019 | Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai
- (PANet) Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network | 2019 | Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
- (PSENet) Shape Robust Text Detection with Progressive Scale Expansion Network | 2019 | Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
- (TextSnake) TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes | 2018 | Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, Cong Yao
- (SAST) A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning | 2019 | Pengfei Wang, Chengquan Zhang, Fei Qi, Zuming Huang, Mengyi En, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi
- EAST: An Efficient and Accurate Scene Text Detector | 2017 | Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang
- DeepText: A Unified Framework for Text Proposal Generation and Text Detection in Natural Images | 2016 | Zhuoyao Zhong, Lianwen Jin, Shuye Zhang, Ziyong Feng
- (CTPN) Detecting Text in Natural Image with Connectionist Text Proposal Network | 2016 | Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao
-
Text Recognition
- UTRNet: High-Resolution Urdu Text Recognition In Printed Documents | 2023 | Abdur Rahman, Arjun Ghosh, Chetan Arora
- SVTR: Scene Text Recognition with a Single Visual Model | 2022 | Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang
- (ABINEt++) ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting | Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang
- (ABINet) Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition | Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, Yongdong Zhang
- Rosetta: Large scale system for text detection and recognition in images | 2019 | Fedor Borisyuk, Albert Gordo, Viswanath Sivakumar
- (CRNN) An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition | 2015 | Baoguang Shi, Xiang Bai, Cong Yao
-
Document Understanding
- LayoutLM: Pre-training of Text and Layout for Document Image Understanding | 2019 | Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
- TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models | 2021 | Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei
- (DONUT) OCR-free Document Understanding Transformer | 2021 | Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park
-
Backbone
- DiT: Self-supervised Pre-training for Document Image Transformer | 2022 | Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
- (oCLIP) Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting | 2022 | Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai
-
Table Analysis
- (MTL-TabNet) An End-to-End Multi-Task Learning Model for Image-based Table Recognition | 2023 | Nam Tuan Ly, Atsuhiro Takasu
- (TableMaster) PINGAN-VCGROUP’S SOLUTION FOR ICDAR 2021 COMPETITION ON SCIENTIFIC LITERATURE PARSING TASK B: TABLE RECOGNITION TO HTML | 2021 | Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao
- CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents | 2020 | Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, Kavita Sultanpure
- TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images | 2020 | Shubham Paliwal, Vishwanath D, Rohit Rahul, Monika Sharma, Lovekesh Vig
-
Datasets
Paper Abstract Language Date Usage (SynthText) Synthetic Data for Text Localisation in Natural Images A fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. English 2016 Det & Rec ICDAR2017 Competition on Multi-lingual scene text detection and script identification MLT-2017 dataset is a multi-language dataset. It includes 9 languages epresenting 6 different scripts. There are 7,200 training images, 1,800 validation images and 9,000 testing images in this dataset Multi 2017 Det (MSRA-TD500 dataset) Detecting texts of arbitrary orientations in natural images The MSRA-TD500 dataset is a text detection dataset that contains 300 training images and 200 test images. Text regions are arbitrarily orientated and annotated at sentence level. Different from the other datasets, it contains both English and Chinese text. English, Chinese 2012 Det DocBank: A Benchmark Dataset for Document Layout Analysis English 2020 Layout TableBank: A Benchmark Dataset for Table Detection and Recognition English 2019 Table (PubTabNet) Image-based table recognition: data, model, and evaluation 2019 Table -
Synthetic Data Generation