Fine tune Detection and Recognition models on custom dataset #637

hkhanr · 2021-11-18T13:25:20Z

hkhanr
Nov 18, 2021

Hello,

I want to fine-tune detection and recognition models on my custom dataset.
I have annotated few document images. I converted these annotation to the format mention in the following link https://github.com/mindee/doctr/tree/main/references/detection and finetuned a detection model. Now I want to finetune the recognition model. However, the given link doesn't clearly tells what should be the annotation format https://github.com/mindee/doctr/tree/main/references/recognition. I mean the given link doesn't tells how to associate the GT box/polygon with GT text.
My question is, how to associate GT polygon with GT text in polygon while training recognition model?

Answered by fg-mindee

Nov 18, 2021

Hi @hkhanr 👋

About the recognition format, I'm not sure I understand your question, but the README does provide examples about the format. But text recognition doesn't have any localization information. Text detection does.

So if you have the OCR annotations (localization + text) of some documents. For the text detection training, only remove the text information. For the text recognition, you will have to crop each localized word.

And then, imagine you have a folder called "images" and put the cropped images in it: crop1.jpg, crop2.jpg, crop3.jpg. Then your labels should be in json format in a "labels.json" next to the "images" folder, and should have this structure:

{"crop1.jpg": "I", "…

View full answer

fg-mindee · 2021-11-18T14:19:58Z

fg-mindee
Nov 18, 2021

Hi @hkhanr 👋

About the recognition format, I'm not sure I understand your question, but the README does provide examples about the format. But text recognition doesn't have any localization information. Text detection does.

So if you have the OCR annotations (localization + text) of some documents. For the text detection training, only remove the text information. For the text recognition, you will have to crop each localized word.

And then, imagine you have a folder called "images" and put the cropped images in it: crop1.jpg, crop2.jpg, crop3.jpg. Then your labels should be in json format in a "labels.json" next to the "images" folder, and should have this structure:

{"crop1.jpg": "I", "crop2.jpg": "love", "crop3.jpg": "OCR"}

Otherwise, end-to-end OCR training is not available in docTR :) I hope that helps!

7 replies

hkhanr Nov 22, 2021
Author

Great, so it means I can train model on skewed boxes. So, if export_as_straight_boxes = False, the model return detected polygons?

What if we want train recognition model on skewed boxes? As you said earlier that the recognition models is trained on cropped boxes. In that case if we have rotated/skewed boxes, how will we crop them? I mean we can only crop straight boxes. Do we have to crop straight boxes out of rotated boxes?

fg-mindee Nov 22, 2021

Great, so it means I can train model on skewed boxes. So, if export_as_straight_boxes = False, the model return detected polygons?

Yup!

What if we want train recognition model on skewed boxes? As you said earlier that the recognition models is trained on cropped boxes. In that case if we have rotated/skewed boxes, how will we crop them? I mean we can only crop straight boxes. Do we have to crop straight boxes out of rotated boxes?

It wraps & rotated the boxes so that the recognition module sees almost straight crops 👍 We do crop rotated boxes with some warping!
In our next iteration we'll have a bit more rotation to the text recognition trainings!

hkhanr Nov 22, 2021
Author

So, for training recognition model, we manually have to unwarp and rotate the boxes so that they becomes straight, right? Well, if this is the case, can you share the piece of code for cropping straight boxes out of rotated polygons for training recognition model?

fg-mindee Nov 22, 2021

Of course, I suggest checking how we do this in the predictors: https://github.com/mindee/doctr/blob/main/doctr/models/predictor/pytorch.py#L64-L67

which uses this method: https://github.com/mindee/doctr/blob/main/doctr/models/predictor/base.py#L49-L67

Let me know if things are unclear (since those are private methods, it's not documented as well as public methods/functions 😅)

hkhanr Nov 24, 2021
Author

@fg-mindee Thanks for sharing this method. Btw, I have written my own code of applying perspective transform on vertices of polygons and converted them into (xmin, ymin, xmax, ymax).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tune Detection and Recognition models on custom dataset #637

{{title}}

Replies: 1 comment 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Fine tune Detection and Recognition models on custom dataset #637

hkhanr Nov 18, 2021

Replies: 1 comment · 7 replies

fg-mindee Nov 18, 2021

hkhanr Nov 22, 2021 Author

fg-mindee Nov 22, 2021

hkhanr Nov 22, 2021 Author

fg-mindee Nov 22, 2021

hkhanr Nov 24, 2021 Author

hkhanr
Nov 18, 2021

Replies: 1 comment 7 replies

fg-mindee
Nov 18, 2021

hkhanr Nov 22, 2021
Author

hkhanr Nov 22, 2021
Author

hkhanr Nov 24, 2021
Author