Pure tesseract branch

This branch contains an implementation of OCR using only Tesseract and image manipulation libraries such as OpenCV.

How to run the code

Install Tesseract

To be able to run the code, Tesseract must be installed. See https://github.com/tesseract-ocr/tessdoc/blob/main/Downloads.md for how to download Teserract. Tesseract should be added to the PATH variable.

Move traineddata file

Then this traineddata file should be moved to the tessdata folder where your Tesseract program is stored. The file is collected from https://github.com/DoubangoTelecom/tesseractMRZ and is used according to the following license.

Install python packages

Create virtual environment (first install)

python -m venv venv

Activate virtual environment (every time)

source venv/bin/activate # Unix-like
venv/Scripts/activate # Windows

Set-up (first install)

pip install -e .

Install dependencies

pip install -r requirements.txt # After remote dependency changes

Save your dependency changes

pip freeze > requirements.txt # After local dependency changes

Run main method of a specific file (example):

python -m passport_mrz_reader.pure_tesseract.tesseract_predict

Enable Jupyter widgets

jupyter nbextension enable --py widgetsnbextension --sys-prefix

Open Jupyter notebook (VSCode should also work)

jupyter notebook

Deactivate virtual environment (if desired)

deactivate

Run tests

Tests can be run using the following command:

python -m unittest discover -s passport_mrz_reader/tests

File structure

data:

Contains the labeled dataset and MRZ images

model:

Contains the different trained models, one folder for every approach
- pure tesseract: Contains a .traineddata file which should be moved to the folder where Tesseract is installed.

src:

common:
- Contains functionality that is shared across all implementations
pure tesseract: Contains an implementation using only tesseract and image processing libraries such as Open CV to perform OCR on passport MRZ.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
model/tesseract		model/tesseract
passport_mrz_reader		passport_mrz_reader
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
TRAINEDDATA_LICENSE		TRAINEDDATA_LICENSE
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Pure tesseract branch

How to run the code

Install Tesseract

Move traineddata file

Install python packages

Run tests

File structure

About

Licenses found

Releases

Packages

Languages

License

Licenses found

kundestyrt-gruppe-5-2022/passport-mrz-reader-public

Folders and files

Latest commit

History

Repository files navigation

Pure tesseract branch

How to run the code

Install Tesseract

Move traineddata file

Install python packages

Run tests

File structure

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages