OCR

Repository for development files and documentation for Coptic OCR in development by Coptic Scriptorium.

Much of this work is based on prior work from other researchers that was shared publicly. Please credit the prior researchers along with Coptic Scriptorium.

OCR4All files

Models and training files for OCR4All were developed from original work by Eliese-Sophia Lincke, et al.. The OCR4All team converted the training files and produced a model for the newer version of OCR4All and provided them to Coptic Scriptorium.

Citations: Eliese-Sophia Lincke, Kirill Bulert & Marco Büchler, "Optical Character Recognition for Coptic fonts: A multi-source approach for scholarly editions," in: DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, 87-91. open access; DOI: 10.1145/3322905.3322931

Christian Reul, Dennis Christ, Alexander Hartelt, Nico Balbach, Maximilian Wehner, Uwe Springmann, Christoph Wick, Christine Grundig, Andreas Büttner, and Frank Puppe, "OCR4all—An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings" Appl. Sci. 2019, 9(22), 4853; https://doi.org/10.3390/app9224853

File Structures

Data in the "Processed OCR" directory have been OCR'd and ground truth has been produced.

Meta files within subdirectories contain metadata information.
These documents can be routed through the publication process when ready (see publication issue threads for more information)

The other directories named after editions and editors contain OCR input, output, and unprocessed results (but as ground truth -- no post-processing after ground truth). These documents have been or will be uploaded to GitDox. Check the GH repository information in GitDox for the location of each document.

As of August 2024, the Sahidic documents from Budge editions are all in the Budge-dev repository.
The gitdox subdirectory here should be used for documents manually edited in GitDox that are not from a Budge edition

Name		Name	Last commit message	Last commit date
Latest commit History 564 Commits
Amelineau Lausiac History OCR Results/Bohairic Lausiac History Results		Amelineau Lausiac History OCR Results/Bohairic Lausiac History Results
Amelineau Monks of Egypt OCR Results		Amelineau Monks of Egypt OCR Results
Amelineau Vita Isaac OCR Results/Vita Isaac Bohairic Results		Amelineau Vita Isaac OCR Results/Vita Isaac Bohairic Results
Budge Apoc_Bartholomew_Resurrection of JC/Bartholomew Resurrection Jesus Christ Results		Budge Apoc_Bartholomew_Resurrection of JC/Bartholomew Resurrection Jesus Christ Results
Budge Miscellaneous OCR Output/Budge_Misc_ApocPaul/Apoc Paul Input		Budge Miscellaneous OCR Output/Budge_Misc_ApocPaul/Apoc Paul Input
Budge_Homilies_PseudoTheophilus_On Repentance/PseudoTheophilus On Repentance Results		Budge_Homilies_PseudoTheophilus_On Repentance/PseudoTheophilus On Repentance Results
DeVis_Homelies Coptes de la Vaticane 1922_OCR Results/DeVis_Benjamin Sermon Noces de Cana		DeVis_Homelies Coptes de la Vaticane 1922_OCR Results/DeVis_Benjamin Sermon Noces de Cana
Giron_Legendes Coptes OCR Results		Giron_Legendes Coptes OCR Results
Hyvernat and Balesteri Martyrdoms OCR Results		Hyvernat and Balesteri Martyrdoms OCR Results
Processed OCR		Processed OCR
gitdox		gitdox
models		models
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR

OCR4All files

File Structures

About

Releases

Packages

Contributors 4

License

CopticScriptorium/OCR

Folders and files

Latest commit

History

Repository files navigation

OCR

OCR4All files

File Structures

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages