Merge pull request #32 from Topdu/openocr_svtrv2

[Features] add Openocrv1 and svtrv2
Topdu · Nov 23, 2024 · 8c21766 · 8c21766
2 parents 61716c2 + 5dcdf8f
commit 8c21766
Show file tree

Hide file tree

Showing 12 changed files with 2,503 additions and 260 deletions.
diff --git a/README.md b/README.md
@@ -1,15 +1,34 @@
 # OpenOCR
 
-OpenOCR aims to establish a unified training and evaluation benchmark for scene text detection and recognition algorithms, at the same time, serves as the official code repository for the OCR team from the [FVL](https://fvl.fudan.edu.cn) Laboratory, Fudan University.
-
-We are actively developing and refining it and expect to release the first version as soon as possible.
+We aim to establishing a unified benchmark for training and evaluating models for scene text detection and recognition. Based on this benchmark, we introduce an accurate and efficient general OCR system, OpenOCR. Additionally, this repository will serve as the official codebase for the OCR team from the [FVL](https://fvl.fudan.edu.cn) Laboratory, Fudan University.
 
 We sincerely welcome the researcher to recommend OCR or relevant algorithms and point out any potential factual errors or bugs. Upon receiving the suggestions, we will promptly evaluate and critically reproduce them. We look forward to collaborating with you to advance the development of OpenOCR and continuously contribute to the OCR community!
 
+## Features
+
+- 🔥**OpenOCR: A general OCR system for accuracy and efficiency**
+  - ⚡\[[Quick Start](#quick-start)\] \[[Demo](<>)(TODO)\]
+  - [Introduction](./docs/openocr.md)
+    - A practical version of the model builds on SVTRv2.
+    - Outperforming [PP-OCRv4](<>) released by [PaddleOCR](<>) by 4.5% on the [OCR competition leaderboard](<>).
+    - [x] Supporting Chinese and English text detection and recognition.
+    - [x] Providing server model and mobile model.
+    - [ ] Fine-tuning OpenOCR on a custom dataset
+    - [ ] Export to ONNX engine
+- 🔥**SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition**
+  - \[[Paper](../configs/rec/svtrv2/SVTRv2.pdf)\] \[[Model](./configs/rec/svtrv2/readme.md#11-models-and-results)\] \[[Config, Training and Inference](./configs/rec/svtrv2/readme.md#3-model-training--evaluation)\]
+  - [Introduction](./docs/svtrv2.md)
+    - Developing a unified training and evaluation benchmark for Scene Text Recognition
+    - Supporting for 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
+    - Improving results by 20-30% compared to training on synthetic datasets.
+    - Towards Arbitrary-Shaped Text Recognition and Language modeling with a Single Visual Model.
+    - Surpasses Attention-based Decoder Methods across challenging scenarios in terms of accuracy and speed
+  - [Get Started](./docs/svtrv2.md#get-started-with-training-a-sota-scene-text-recognition-model-from-scratch) with training a SoTA Scene Text Recognition model from scratch.
+
 ## Ours STR algorithms
 
 - [**DPTR**](<>) (*Shuai Zhao, Yongkun Du, Zhineng Chen\*, Yu-Gang Jiang. Decoder Pre-Training with only Text for Scene Text Recognition,* ACM MM 2024. [paper](https://arxiv.org/abs/2408.05706))
-- [**IGTR**](./configs/rec/igtr/) (*Yongkun Du, Zhineng Chen\*, Yuchen Su, Caiyan Jia, Yu-Gang Jiang. Instruction-Guided Scene Text Recognition,* Under TPAMI minor revision 2024. [Doc](./configs/rec/igtr/readme.md), [paper](https://arxiv.org/abs/2401.17851))
+- [**IGTR**](./configs/rec/igtr/) (*Yongkun Du, Zhineng Chen\*, Yuchen Su, Caiyan Jia, Yu-Gang Jiang. Instruction-Guided Scene Text Recognition,* Under TPAMI minor revison 2024. [Doc](./configs/rec/igtr/readme.md), [paper](https://arxiv.org/abs/2401.17851))
 - [**SVTRv2**](./configs/rec/svtrv2) (*Yongkun Du, Zhineng Chen\*, Hongtao Xie, Caiyan Jia, Yu-Gang Jiang. SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition,* 2024. [paper](./configs/rec/svtrv2/SVTRv2.pdf))
 - [**SMTR&FocalSVTR**](./configs/rec/smtr/) (*Yongkun Du, Zhineng Chen\*, Caiyan Jia, Xieping Gao, Yu-Gang Jiang. Out of Length Text Recognition with Sub-String Matching,* 2024. [paper](https://arxiv.org/abs/2407.12317))
 - [**CDistNet**](./configs/rec/cdistnet/) (*Tianlun Zheng, Zhineng Chen\*, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang. CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition,* IJCV 2024. [paper](https://link.springer.com/article/10.1007/s11263-023-01880-0))
@@ -19,9 +38,78 @@ We sincerely welcome the researcher to recommend OCR or relevant algorithms and
 - [**SVTR**](./configs/rec/svtr/) (*Yongkun Du, Zhineng Chen\*, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang. SVTR: Scene Text Recognition with a Single Visual Model,* IJCAI 2022 (Long). [PaddleOCR Doc](https://github.com/Topdu/PaddleOCR/blob/main/doc/doc_ch/algorithm_rec_svtr.md), [paper](https://www.ijcai.org/proceedings/2022/124))
 - [**NRTR**](./configs/rec/nrtr/) (*Fenfen Sheng, Zhineng Chen\*, Bo Xu. NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition,* ICDAR 2019. [paper](https://arxiv.org/abs/1806.00926))
 
-## STR
+## Recent Updates
+
+- **🔥 2024.11.23 release notes**:
+  - **OpenOCR: A general OCR system for accuracy and efficiency**
+    - ⚡\[[Quick Start](#quick-start)\] \[[Demo](<>)(TODO)\]
+    - [Introduction](./docs/openocr.md)
+  - **SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition**
+    - \[[Paper](../configs/rec/svtrv2/SVTRv2.pdf)\] \[[Model](./configs/rec/svtrv2/readme.md#11-models-and-results)\] \[[Config, Training and Inference](./configs/rec/svtrv2/readme.md#3-model-training--evaluation)\]
+    - [Introduction](./docs/svtrv2.md)
+    - [Get Started](./docs/svtrv2.md#get-started-with-training-a-sota-scene-text-recognition-model-from-scratch) with training a SoTA Scene Text Recognition model from scratch.
+
+## ⚡[Quick Start](./docs/openocr.md#quick-start)
+
+#### Dependencies:
+
+- [PyTorch](http://pytorch.org/) version >= 1.13.0
+- Python version >= 3.7
+
+```shell
+conda create -n openocr python==3.8
+conda activate openocr
+conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
+```
+
+After installing dependencies, the following two installation methods are available. Either one can be chosen.
+
+#### 1. Python Modules
+
+```shell
+pip install openocr-python
+```
+
+**Usage**:
+
+```python
+from openocr import OpenOCR
+
+engine = OpenOCR()
 
-Reproduction schedule:
+img_path = '/path/img_path or /path/img_file'
+result, elapse = engine(img_path)
+print(result)
+print(elapse)
+
+# Server mode
+engine = OpenOCR(mode='server')
+```
+
+#### 2. Clone this repository:
+
+```shell
+git clone https://github.com/Topdu/OpenOCR.git
+cd OpenOCR
+pip install -r requirements.txt
+```
+
+**Usage**:
+
+```shell
+# OpenOCR system: Det + Rec model
+python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file
+
+# Det model
+python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file
+
+# Rec model
+python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_fold or /path/img_file
+```
+
+## Reproduction schedule:
+
+### Scene Text Recognition
 
 | Method                                        | Venue                                                                                          | Training | Evaluation | Contributor                                 |
 | --------------------------------------------- | ---------------------------------------------------------------------------------------------- | -------- | ---------- | ------------------------------------------- |
@@ -56,21 +144,25 @@ Reproduction schedule:
 | [IGTR](./configs/rec/igtr/)                   | [2024](https://arxiv.org/abs/2401.17851)                                                       | ✅       | ✅         |                                             |
 | [SMTR](./configs/rec/smtr/)                   | [2024](https://arxiv.org/abs/2407.12317)                                                       | ✅       | ✅         |                                             |
 | [FocalSVTR-CTC](./configs/rec/svtrs/)         | [2024](https://arxiv.org/abs/2407.12317)                                                       | ✅       | ✅         |                                             |
-| [SVTRv2](./configs/rec/svtrv2/)               | 2024                                                                                           | ✅       | ✅         |                                             |
+| [SVTRv2](./configs/rec/svtrv2/)               | [2024](./configs/rec/svtrv2/SVTRv2.pdf)                                                        | ✅       | ✅         |                                             |
 | [ResNet+Trans-CTC](./configs/rec/svtrs/)      |                                                                                                | ✅       | ✅         |                                             |
 | [ViT-CTC](./configs/rec/svtrs/)               |                                                                                                | ✅       | ✅         |                                             |
 
-### Contributors
+#### Contributors
 
 ______________________________________________________________________
 
 Yiming Lei ([pretto0](https://github.com/pretto0)) and Xingsong Ye ([YesianRohn](https://github.com/YesianRohn)) from the [FVL](https://fvl.fudan.edu.cn) Laboratory, Fudan University, under the guidance of Professor Zhineng Chen, completed the majority of the algorithm reproduction work. Grateful for their outstanding contributions.
 
-______________________________________________________________________
+### Scene Text Detection (STD)
 
-## STD
+TODO
 
-## E2E
+### Text Spotting
+
+TODO
+
+______________________________________________________________________
 
 # Acknowledgement
 

diff --git a/configs/det/dbnet/repvit_db.yml b/configs/det/dbnet/repvit_db.yml
@@ -10,7 +10,7 @@ Global:
   - 1000
   cal_metric_during_train: false
   checkpoints:
-  pretrained_model: paddle_to_openocr_det_repvit_ch.pth
+  pretrained_model: openocr_det_repvit_ch.pth
   save_inference_dir: null
   use_visualdl: false
   infer_img: ./testA
@@ -53,9 +53,10 @@ Architecture:
 PostProcess:
   name: DBPostProcess
   thresh: 0.3
-  box_thresh: 0.6
+  box_thresh: 0.4
   max_candidates: 1000
   unclip_ratio: 1.5
+  score_mode: 'slow'
 
 # Metric:
 #   name: DetMetric
@@ -144,8 +145,8 @@ Eval:
         # image_shape: [1280, 1280]
         # keep_ratio: True
         # padding: True
-        # limit_side_len: 1280
-        # limit_type: max
+        limit_side_len: 960
+        limit_type: max
     - NormalizeImage:
         scale: 1./255.
         mean: