How to have new Vocab and its Training #608
-
Hi all, First of all, I tested and admired your project. What I want is to add turkish vocab, and retrain the recognition model. My question is; Can I use classification training under references section? Or do i have to use recognition training with prepared data. Also If your answer is recognition part, do you have any data generator from texts for that format? I could not find a document and, in the discussion part, I could not see any explanation about it. If there is, please guide me and forgive me. Regards |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Hi @osman-aktepe, thank you for your interest in docTR! If you want to train with turkish vocab, we first need to integrate this vocab to doctr indeed. Then, you need to retrain a recognition model with this vocab on a turkish dataset (images of word boxes + corresponding annotations). I hope this answer your question ! 😄 |
Beta Was this translation helpful? Give feedback.
Hi @osman-aktepe, thank you for your interest in docTR!
If you want to train with turkish vocab, we first need to integrate this vocab to doctr indeed. Then, you need to retrain a recognition model with this vocab on a turkish dataset (images of word boxes + corresponding annotations).
If you don't have such a dataset, you can either collect turkish pictures of words and then annotate them manually (or pass them through another OCR to get them annotated), or you can generate a fully synthetic dataset writing words on images with different fonts, sizes, colors, ... and you have directly the annotations because you know the words you just drew.
I hope this answer your question ! 😄