GitHub - NalinMalla/Coupled-Handwritten-OCR: An optical character recognition (OCR) software which excels in reading handwritten English text. By integrating modified versions of pre-existing technologies like DocTR, PaddleOCR, PyTesseract, and TrOCR into an optimal pipeline, overall accuracy was significantly enhanced by 22%.

An optical character recognition (OCR) software which excels in reading handwritten English text. It leverages modified versions of various pre-existing OCR technologies like DocTR, PaddleOCR, PyTesseract, and TrOCR which have been stacked to create an optimal pipeline to improve overall performance significantly.

To accomplish this the following actions were performed:

I thorough researched pre-existing OCR technologies and selected the best freely available tools.
I had to identify the best combination of OCR tools that maximized detection and recognition accuracy, so I started by creating an evaluation module.
After this, I had to collect and manually annotate datasets.
Additionally, various other support tools were also created. They are as follows:
- Annotation formatter (for converting annotated dataset into usable formats)
- Multi-file OCR
- Image processor
Finally, I iteratively trained machine learning models by fine-tuning training parameters to achieve the final product.

This system was able to outperform the best available free tools by a remarkable 22%. This is illustrated in the following diagram. The detailed performance report of these tools can be found in their corresponding eval.json files which are located in the eval_outputfolder.

Various pre-existing OCR tools were researched for this project. They are listed in the following table.

Author: Nalin Malla; Additional Credit: Thank you Mr. Aayush Baral for training initial version of custom paddle detection model.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
eval_output		eval_output
LoR - TreeLeaf.pdf		LoR - TreeLeaf.pdf
README.md		README.md
Tested_OCR_Tools.png		Tested_OCR_Tools.png
ocr_performance_comparison.png		ocr_performance_comparison.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

NalinMalla/Coupled-Handwritten-OCR

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages