An optical character recognition (OCR) software which excels in reading handwritten English text. It leverages modified versions of various pre-existing OCR technologies like DocTR, PaddleOCR, PyTesseract, and TrOCR which have been stacked to create an optimal pipeline to improve overall performance significantly.
To accomplish this the following actions were performed:
- I thorough researched pre-existing OCR technologies and selected the best freely available tools.
- I had to identify the best combination of OCR tools that maximized detection and recognition accuracy, so I started by creating an evaluation module.
- After this, I had to collect and manually annotate datasets.
- Additionally, various other support tools were also created. They are as follows:
- Annotation formatter (for converting annotated dataset into usable formats)
- Multi-file OCR
- Image processor
- Finally, I iteratively trained machine learning models by fine-tuning training parameters to achieve the final product.
This system was able to outperform the best available free tools by a remarkable 22%. This is illustrated in the following diagram.
The detailed performance report of these tools can be found in their corresponding
eval.json
files which are located in the eval_output
folder.
Various pre-existing OCR tools were researched for this project. They are listed in the following table.
Author: Nalin Malla; Additional Credit: Thank you Mr. Aayush Baral for training initial version of custom paddle detection model.