How to select the appropriate vocab for a text recognition training #675

lfxuan · 2021-12-03T07:14:04Z

lfxuan
Dec 3, 2021

Hello! Why does the list (map (vocab. Index, input_string)) report an error valueerror: substring not found when training the recognition model

Answered by fg-mindee

Dec 3, 2021

'Morning @lfxuan 👋

As mentioned by Charles, we would need a bit more information to have a comprehensive answer. But considering your error, I'm guessing you're training on a dataset that has characters outside of the vocab you selected 🤔

You can easily whether this is the case by printing the string that causes this error and then checking whether all characters are included in the vocab https://github.com/mindee/doctr/blob/main/doctr/datasets/vocabs.py (the default one on the script is "french") 👍

If this is the case, try to select a more appropriate vocab for your dataset, and if it doesn't exist yet in docTR, we can discuss whether we should extend the range of it 😁

Have a good day!

View full answer

charlesmindee · 2021-12-03T09:29:04Z

charlesmindee
Dec 3, 2021
Maintainer

Hi @lfxuan,

Thanks for reporting this, would you please give me the following:

the code sample to reproduce the bug
your environment (result of the collect_env.py script)
eventually the image you passed to the OCR

Thanks: 🙏

1 reply

lfxuan Dec 9, 2021
Author

Thank you for your answer!

fg-mindee · 2021-12-03T09:57:17Z

fg-mindee
Dec 3, 2021

'Morning @lfxuan 👋

As mentioned by Charles, we would need a bit more information to have a comprehensive answer. But considering your error, I'm guessing you're training on a dataset that has characters outside of the vocab you selected 🤔

You can easily whether this is the case by printing the string that causes this error and then checking whether all characters are included in the vocab https://github.com/mindee/doctr/blob/main/doctr/datasets/vocabs.py (the default one on the script is "french") 👍

If this is the case, try to select a more appropriate vocab for your dataset, and if it doesn't exist yet in docTR, we can discuss whether we should extend the range of it 😁

Have a good day!

1 reply

lfxuan Dec 9, 2021
Author

Thank you very much for your answer! The reason for the error is that the tag mistakenly adds a space when writing the JSON file. The second vocab ['french '] has no space character, which has been solved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to select the appropriate vocab for a text recognition training #675

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to select the appropriate vocab for a text recognition training #675

lfxuan Dec 3, 2021

Replies: 2 comments · 2 replies

charlesmindee Dec 3, 2021 Maintainer

lfxuan Dec 9, 2021 Author

fg-mindee Dec 3, 2021

lfxuan Dec 9, 2021 Author

lfxuan
Dec 3, 2021

Replies: 2 comments 2 replies

charlesmindee
Dec 3, 2021
Maintainer

lfxuan Dec 9, 2021
Author

fg-mindee
Dec 3, 2021

lfxuan Dec 9, 2021
Author