EvArEST

Everyday Arabic-English Scene Text dataset, from the paper: Arabic Scene Text Recognition in the Deep Learning Era: Analysis on A Novel Dataset

Detection Dataset

The text detection dataset has 510 images all containing one or more instances of text. Each word is annotated with a four-point polygon that starts with the top left corner of the polygon and follows clockwise. Each image comes with a text file containing three attributes: the four points of the polygon that contains the word, the language of the word.

Training Data

Test Data

Recognition Dataset

The text recognition dataset comprises of 7232 cropped word images of both Arabic and English languages. The groundtruth for the recognition dataset is provided by a text file with each line containing the image file name and the text in the image. The dataset could be used for Arabic text recognition only and could be used for bilingual text recognition.

Training Data:

Arabic- English

Test Data:

Arabic- English

Synthetic Data

About 200k synthetic images with segmentation maps.

SynthData

Code for Synthetic Data Generation:

https://github.com/HGamal11/Arabic_Synthetic_Data_Generator

Other Resources for Arabic Data

ICDAR 2019 Robust Reading Challenge on Multi-lingual scene text detection and recognition

https://rrc.cvc.uab.es/?ch=15&com=tasks

Synthetic MLT Data

https://github.com/MichalBusta/E2E-MLT

Citation

If you find this dataset useful for your research, please cite

@article{hassan2021arabic,
  title={Arabic Scene Text Recognition in the Deep Learning Era: Analysis on A Novel Dataset},
  author={Hassan, Heba and El-Mahdy, Ahmed and Hussein, Mohamed E},
  journal={IEEE Access},
  year={2021},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DetEx.png		DetEx.png
LICENSE		LICENSE
README.md		README.md
RecogEx.png		RecogEx.png
_config.yml		_config.yml
gensynth.png		gensynth.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvArEST

Detection Dataset

Recognition Dataset

Synthetic Data

Other Resources for Arabic Data

Citation

About

Releases

Packages

License

HGamal11/EvArEST-dataset-for-Arabic-scene-text

Folders and files

Latest commit

History

Repository files navigation

EvArEST

Detection Dataset

Recognition Dataset

Synthetic Data

Other Resources for Arabic Data

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages