Skip to content

How to select the appropriate vocab for a text recognition training #675

Answered by fg-mindee
lfxuan asked this question in Q&A
Discussion options

You must be logged in to vote

'Morning @lfxuan 👋

As mentioned by Charles, we would need a bit more information to have a comprehensive answer. But considering your error, I'm guessing you're training on a dataset that has characters outside of the vocab you selected 🤔

You can easily whether this is the case by printing the string that causes this error and then checking whether all characters are included in the vocab https://github.com/mindee/doctr/blob/main/doctr/datasets/vocabs.py (the default one on the script is "french") 👍

If this is the case, try to select a more appropriate vocab for your dataset, and if it doesn't exist yet in docTR, we can discuss whether we should extend the range of it 😁

Have a good day!

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@lfxuan
Comment options

Comment options

You must be logged in to vote
1 reply
@lfxuan
Comment options

Answer selected by lfxuan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
ext: references Related to references folder topic: text recognition Related to the task of text recognition
3 participants