Skip to content

How to have new Vocab and its Training #608

Answered by charlesmindee
osman-aktepe asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @osman-aktepe, thank you for your interest in docTR!

If you want to train with turkish vocab, we first need to integrate this vocab to doctr indeed. Then, you need to retrain a recognition model with this vocab on a turkish dataset (images of word boxes + corresponding annotations).
If you don't have such a dataset, you can either collect turkish pictures of words and then annotate them manually (or pass them through another OCR to get them annotated), or you can generate a fully synthetic dataset writing words on images with different fonts, sizes, colors, ... and you have directly the annotations because you know the words you just drew.

I hope this answer your question ! 😄

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@osman-aktepe
Comment options

@charlesmindee
Comment options

@osman-aktepe
Comment options

@charlesmindee
Comment options

@osman-aktepe
Comment options

Answer selected by charlesmindee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants